NPC E4 Bioinformatics in Proteomics
Theme leaders:
Background
Bioinformatics is an essential component of Proteomics. Although, bioinformatics research has been give an extensive impetus through NGI via the formation of NBIC, proteomics driven bioinformatics has stayed behind. The international science review picked especially this caveat in her review of the NPC.
In NPC II, the taken path in NPC I is further deepened and intensified. Generally, the ever increasing flow of data from proteomics experiments have led to a large demand on bioinformatics to develop the necessary infrastructure for the storage, processing, analysis and visualization of proteomics data. Furthermore targeted bioinformatics that address specific needs in the cutting edge research areas within the NPC are required.
Proteomics data contains a wealth of putatively unused information; this becomes especially apparent when data from mass spectrometry data is considered. In short, MS data is recorded, pre-processed, filtered and analyzed using search algorithms and databases. Each step has its advantages, however putative good spectra are lost in these processes. Often a vast amount of proteins are identified however these identifications often originate from actually only 20-40% of all recorded spectra, thus still a lot of data remains open for exploration. It is in this area that bioinformatics can provide generic tools for data (pre)processing, alternative sequence search tools and peptide/protein annotation. Furthermore protein identification reliability could be increased by comparison of experimental measured and in-silico calculated molecular properties like isotope ratio, hydrophobicity (retention time). In addition by providing a database structure for storage and annotated (raw) mass spectrometry data, it then becomes possible to perform higher order analysis such as cross experiment comparisons. Integration of annotated quantitative proteomics data with other ‘omics data can provide additional insights into protein complexes, (protein) networks and biological pathways. The NPC plans to collaborate with (inter)national partners, such as the EBI and NBIC, industrial instrument and software developers to (co)develop tools and databases for this purpose. Finally experimental design, biostatistics, statistical validation techniques, efficient clustering and classification algorithms will assist the researcher is his quest to analyze and interpret the data at hand more efficiently.
Approach
Within NPC we aspire to cover as many proteomics bioinformatics topics as possible with a maximum cross connectivity between the project and or themes. Bioinformatics for proteomics is funded from different sources; Via the NPC Bioinformatics programme, Bioinformatics hotel, NBIC (I & II) and via NGI with additional provisional funding to strengthen the bioinformatics for proteomics.
In the Bioinformatics program we will have three projects and three matching projects. The first project will focus on developing better algorithms for peptide and protein identification using improved data(pre)processing, through incorporation of comparison of experimental and in silico calculated molecular properties, MS/MS spectral network analysis across multiple experiments and incorporation of several post translation modifications in de novo search algorithms. The second project will have aim to develop highly accurate data processing tools for label free and isotope-labeled quantification data including automated alignment of multiple datasets, and design of statistical analysis and validation strategies using spiking experiments and statistical simulations. The third topic will aim at international collaborations. The NPC will collaborate with the Proteomics Services Group at European Bioinformatics Institute (EBI) to expand the PRIDE proteomics repository with the ability to capture quantitative proteomics data. Here a student at the NPC will initially contribute reference datasets, and can subsequently use the quantitative information captured in the updated repository for further analyses and potentially subtractive proteomics. In the matching projects, the focus will be on pre processing of MS(MS) data, initially to facilitate more and better identifications using peptide search algorithms. The second matching project will implement the developed retention time algorithms into a web-service based tool in collaboration with IBM. Finally the third matching project will enable the PRIDE database for protein identifications to accommodate (store) quantitative mass spectrometry data (comparable to array express for microarray data).
Deliverables
- (Co) Development of a proteomics platform
- A proteomics data processing and analysis pipeline
- Statistical methods for the analysis of LC-MS data
- De Novo algorithms for confident de novo analysis of peptide fragmentation spectra
- A warehouse for mass spectrometry data (Pride) in collaboration with EBI
- Pre-processing and filtering algorithms for fragmentation spectra
- Hard and software infrastructure of MS-Data analysis
- Novel and more efficient algorithms for 2D-LC MS(MS) data analysis
- Statistical methods label free quantification in biomarker research.
- Methods for integration quantitative proteomics data with other ‘omics data.


