Discovery Proteomics


Discovery proteomics aims to discover as much of the proteome as possible in a particular cellular context. Bottom up proteomics digests the proteins and analyses the peptides. A botton up experiment involves the extraction of proteins, digestion of protein into peptide, separation by liquid chromatography and analysis by mass spectrometry. The mass spectrometer first detects the peptide masses and intensities. In a data dependent analysis, the most intense peptides are isolated, fragmented and detected. These fragmentation specta allow the sequence of the peptide to be determined through peptide spectral matching. Identifying the peptide sequences allows the inference of the proteins of origin. Using the relative intensity of the peptides it is then possible to determine the relative intensity of the proteins. Compring the relative intensities of proteins between different cellulare states allows inference of processing at work within the cellular states.

Sample Preparation

Protein Extraction

Extract and enrich proteins from biological samples


Desalt and clean up peptide sample before LC-MS

Protein Quantiation

Determine the quantity of protein in a sample

Sample Fractionation

Fractionate samples before mass spectrometry analysis to increase the number of proteins identified

Protein Digestion

Digest proteins into peptide for detection by mass spectrometry

Liquid Chromatography

Liquid chromatography separation coupled to mass spectrometry, separating sample over time to get the most out of the analysis.

Isotopic Labelling

Label samples with isotopic or isobaric labels, thereby enabling multiplexing of samples and increased accuracy of relative abundance comparisons.

Mass Spectrometry

Detect the mass/charge ratio and abundance of ionised analytes.


Data Generation

Once the sample have been prepared. The peptides are analysed by mass spectrometry. There are a variety of types of mass spectrometers, but their general function is fairly similar. Mass spectrometers consist of three parts; A source, an analyser and a detector. The source is where the peptide are ionised and introduced into the mass spectrometer. The analyser separates peptides based on their mass/charge ratio. The detector detects the analytes once they have been separated. The resultant data is a mass/charge ratio and intensity value for each analyte. In tandem mass spectrometry samples are measured twice. In the first round (MS1) peptide mass/charge and intensities are measured. In the second round high abundant peptides are individually isolated and fragmented by colliding them with inert gas. A spectrum of fragments are then collected. The difference in size of the fragments corresponds to the mass of animo acids. Using peptide spectral matching it is possible to determine the sequence of the peptides and the identity of the proteins they originate from.

Mass spectrometers determine the mass/charge and abundance of ionised analytes.

Thermo Fisher LTQ Velos

Combines the proven mass accuracyand ultra-high resolution of the Orbitrapmass analyzer, with the increased sensitivi-ty and improved cycle time of the LTQ Velos

Thermo Fisher Orbitrap Fusion

Orbitrap Fusion™ Tribrid™ Mass Spectrometer. This instrument combines the best of quadrupole, ion trap and Orbitrap mass analysis in a revolutionary Tribrid architecture to provide unprecedented depth of analysis and ease of use.

Thermo Fisher Q Exactive

Hybrid Quadrupole-Orbitrap Mass Spectrometer. This benchtop LC-MS/MS system combines quadruple precursor ion selection with high-resolution, accurate-mass (HRAM) Orbitrap detection to deliver exceptional performance and versatility

Data Analysis

Mass spectrometer raw data consists of a list of masses and intensities detected at different times or scans. These need to be matched to know analytes in order to be useful. In bottom up data dependent proteomics, the mass spectra are matched the theoretical peptides from a protein database. A FASTA file is downloaded from a protein database for the organism being studied. The proteins in the FASTA file are digested in-silico using the same enzyme as in the experiment. The mass spectrometry spectra are then matched the theoretical spectra generated for the in-silico generated sequences. The matches are scored. Using a decoy database consisting of random or reverse sequences, decoy matches are also generated. Using the decoy matches a false discovery rate can be determined and used to identify the peptides that are most likely true matches. Each peptide is associated with the intentisty of the MS1 peaks for that peptide.


Proteomics databases which contain protein sequences for peptide spectral matching


Uniprot proteomes contains protein sequences from organisms that have had their genomes sequences. Some of these protein sequences may be manually curated while other automatically annotated.

Joint Genome Institute

Genomes of microbes, fungi algae and plants


Genome browser for vertebrate genomes


NCBI Reference Sequence Database:
"A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein."

Peptide Spectral Matching

Tandem mass spectra matched to theoretical spectra from protein sequence database


"Progenesis QI for proteomics is discovery analysis software for your LC-MS data; a revolutionary ‘difference engine’ that works in a unique way to help you to answer your biological question."


Peaks Studio

" PEAKS Studio is a software platform with complete solutions for discovery proteomics, including protein identification and quantification, analysis of post-translational modifications (PTMs) and sequence variants (mutations), and peptide/protein de novo sequencing."


"SearchGUI is a highly adaptable open-source common interface for configuring and running proteomics search and de novo engines, currently supporting X! Tandem, MS-GF+, MS Amanda, MyriMatch, Comet, Tide, Andromeda, OMSSA, Novor and DirecTag."


"Byonic™ is our full MS/MS search engine providing unequalled sensitivity for comprehensive peptide and protein identification. Byonic™ results can be input into Byologic™ and/or Byomap™ along with the raw mass spec data and any HPLC data."



"The Crux mass spectrometry analysis toolkit is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data"


"MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data."


"ProteinPilot™ Software is used for protein identification and relative protein expression analysis for protein research. ProteinPilot Software is compatible with all proteomics MS/MS systems via the.*mgf format."


Trans-Proteomic Pipeline

"The Trans-Proteomic Pipeline (TPP) is a collection of integrated tools for MS/MS proteomics"

Quantitation Analysis

Once peptides have been identified and relative protein abundances determined, statistical tests determine which of the protein as significantly deferentially expressed between biological conditions and require further study.



" Perseus contains a comprehensive portfolio of statistical tools for high-dimensional omics data analysis covering normalization, pattern recognition, time-series analysis, cross-omics comparisons and multiple-hypothesis testing"



"Visualize and validate complex MS/MS proteomics experiments"




"Progenesis QI for proteomics is discovery analysis software for your LC-MS data; a revolutionary ‘difference engine’ that works in a unique way to help you to answer your biological question."

Functional Annotation

Once a list of significantly deferentially expressed proteins has been determined. These proteins need to be functionally annotated in order to determine what potential roles they may be playing in the biological system. Using over-representation analysis and gene set enrichment, proteins can be assigned to functional units. The classification of these functional units are found within gene ontologies, metabolic pathways and signalling pathway databases.


"statistical analysis and visualization of functional profiles for genes and gene clusters"


"REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database."


"STRING is a database of known and predicted protein-protein interactions."


"topGO package provides tools for testing GO terms while accounting for the topology of the GO graph."


"KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information."


"Provide a rapid means to reduce large lists of genes into functionally related groups of genes to help unravel the biological content captured by high throughput technologies."


Bioinformaticians are available to assist you with your project.


The earlier you contact them the more assistance they will be able to offer. In particular, the experimental design is critical in ensuring the success of any project. Contacting a statistician and ensuring your experiment has enough statistical power will go a long way to ensuring its success.


Selecting the best technology for your project will ensure you get the best results for the your project. Omics research is costly, choosing the most appropriate technology for you experiment and budget is therefor critical.


It is best to first run a pilot study and having an expert check the quality of the results before continuing with the bulk of the analysis. The pilot project will also allow you to familiaries yourself with the sample analysis process, the data generated and the means of analysis, before embarking on the main project.


Once you have produced the data, you will realise omics technologies produce mountains of data. It often requires some expertise in handling big data, to deal with the amounts of data produced. Fortunately we have tools and resources to store and process your data making it easy for you to understand. Contact our team of expert bioninformaticians for assistance on all levels of your project.

Dr Shaun Garnett

Post-Doctoral Fellow at University of Cape Town


  • Transcriptomics

  • Proteomics

  • Differential Abundance Statistics



  • Liquid Chromatorgaphy

  • Mass Spectrometry

  • Discovery Proteomics

  • Statistics

  • Expression Data Functional Annotation


  • MaxQuant

  • Skyline

  • TPP

  • clusterProfiler

  • topGO

  • STRINGdb

  • ReactomePA

©2018 by SA-DIPLOMICS. Proudly created with