Software


This page describes the computational tools and software developed by Nesvizhskii lab. You can learn more about our tools at the following courses:

CSHL Course on Proteomics Description
July 2014, Cold Spring Harbor Laboratory (CSHL), New York
We contribute a workshop (lecture, hands-on tutorial) on the analysis of AP-MS protein-protein interaction data

US HUPO short course: Stable and Transient Protein-Protein Interactions: Discovery, Quantification and Validation
Description and Registration
April 6, 2014 Seattle, Washington
Instructors: Ileana Cristea, Alexey Nesvizhskii
This course will cover fundamental and practical aspects of studying protein interactions, from experimental design to data analysis.It will include a practical demonstration of the CRAPome resource.

ASMS short course: Bioinformatics for Protein Identification
Description and Registration
June 2014 Baltimore, Maryland
Instructors: Alexey Nesvizhskii, David Tabb, Nuno Bandiera
The year will be the fourth year we are teaching this course. The course seeks to familiarize proteomics researchers with the inner workings of the software that enables the analysis of mass spectrometry-based proteomics data. The course includes a hands-on tutorial on the Trans-Proteomic Pipeline.



CRAPome

Contaminant Repository for Affinity Purification (CRAPome) is a database of annotated negative controls contributed by the proteomics research community. It addresses the common problem of distinguishing real interactions from the non-specific background (also known as 'contaminants'). The database and associated computational tools to score protein interactions are available online. The intuitive web-interface can be used to explore the database and to analyze user-uploaded data. Users can:
- query one protein at a time
- download background contaminant lists for selected experimental conditions, and/or
- upload their own data and perform analysis using SAINT or SAINTexpress and empirical scoring

Website: http://www.crapome.org

Key Publications:
The CRAPome: a contaminant repository for affinity purification - mass spectrometry data. D. Mellacheruvu et al. Nature Methods 10: 730-6 (2013) Manuscript

SAINT

Computational models and software for assigning confidence scores to protein-protein interactions in label-free quantitative AP-MS datasets. For each observed interaction with associated label-fee quantification, SAINT calculates the probability of true interaction. The modeling incorporates various data normalization steps and is also capable of utilizing the quantittaive information from negative control purifications for improving specificity in small-to-intermediate scale experiments (SAINT v. 2). The method was initially developed for label-free spectral count data, but was later extended to MS1 intensity-based quantitative data (SAINT-MS1). SAINTexpress is a recently developed fast version of the algorithm.

SAINT and SAINTexpress can be run online using the CRAPome: www.crapome.org

Download source code: http://saint-apms.sourceforge.net/

SAINT Tools and Publication:
SAINT v 2: Choi, H., Larsen, B., Lin., Z.-Y., Breitkreutz, A., Mellacheruvu, D., Fermin, D., Qin, Z.S., Tyers, M., Gingras, A.-C. and Nesvizhskii, A.I. Nature Methods 8:70-3 (2010) Manuscript
* This is the key reference for the SAINT series of algorithms. It introduces the semi-supervised SAINT model, which is based on comparing the spectral count distribution across the negative control runs to the counts for the same prey in the purification of the bait.

SAINT v 1: Breitkreutz, A., Choi, H., Sharon, J., Boucher, L., Neduva, V., Larsen, B.G., Lin, Z.-Y.,Breitkreutz, B.-J., Stark, C., Liu, G., Ahn, J., Dewar-Darch, D., Tang, X., Almeida, V., Qin, Z.S., Pawson, T., Gingras, A.-C, Nesvizhskii, A., Tyers, M. Global architecture of the yeast kinome interaction network, Science 328, 1043 - 104 (2010) Manuscript
* The first, unsupervised SAINT model that does not require negative controls for scoring. Designed for large scale projects that profile > 20-30 baits that share very few interactions.

SAINT-MS1: Choi H., Glatter T., Gstaiger M., Nesvizhskii A.I. SAINT-MS1: Protein-protein interaction scoring using label-free intensity data in affinity purification-mass spectrometry experiments. Journal of Proteome Research (2012) 11(4):2619-24 Manuscript
* Extension of the model and the software to continuous (e.g. intensity base) quantitative data

SAINTexpress: Teo, G., Liu, G., Zhang, J., Gingras, A.C., Choi, H. SAINTexpress: improvements and additional features in Significance Analysis of INTeractome for AP-MS data. J Proteomics in press (2013) Manuscript
* A computationally efficient version of the SAINT tool, with an optional re-scoring based on externally acquired information.

Reviews and Tutorials:
Choi, H., Liu, G., Tyers, M., Gingras, A.-C. and Nesvizhskii, A.I. (2012) Analyzing protein-protein interactions from affinity purification-mass spectrometry data with SAINT. Cur Protoc Bioinformatics, Chapter 8:Unit8.15. Manuscript
* This is a detailed protocol for the use of SAINT, which defines options (minFold, lowMode and Norm) that can be tailored to the dataset to be analyzed.

Nesvizhskii, A.I. Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments. Proteomics 12:1639-55 (2012) Manuscript
* This is a detailed review on the analysis of AP-MS data.

PROHITS

ProHits is a Laboratory Management System (LIMS) for interaction proteomics developed primarily by the Anne-Claude Gingras and Mike Tyers laboratories in collaboration with Nesvizhskii lab. It is a comprehensive system that integrates the TPP/iProphet for peptide/protein identification and SAINT suite of tools for interaction scoring.

Download: http://prohitsms.com/Prohits_download/list.php/

Key Publications:
Liu G., Zhang J., Larsen B., Stark C., Breitkreutz A., Lin Z.Y., Breitkreutz B.J., Ding Y., Colwill K., Pasculescu A., Pawson T., Wrana J.L., Nesvizhskii A.I., Raught B., Tyers M., Gingras A.C. ProHits: integrated software for mass spectrometry-based interaction proteomics, Nat Biotech 28:1015-17 (2010). Manuscript
* This is the key manuscript that descibes the ProHits LIMS system.

Liu G., Zhang J., Choi H., Lambert J.P., Srikumar T., Larsen B., Nesvizhskii A.I., Raught B., Tyers M., Gingras A.C. Using ProHits to Store, Annotate, and Analyze Affinity Purification - Mass Spectrometry (AP-MS) Data. Current Protocols in Bioinformatics Unit 8.16 Manuscript
* This is a Protocols manuscript that describes the installation and use of ProHits system. It also introduces a simplified, virtual machine implementation (ProHits Lite software)

Luciphor

Luciphor is a program for localizing the sites of post-translational modifications (e.g. phosphorylation)on peptide sequences. LuciPHOr carries out simultaneous localization on all candidate sites in each peptide and estimates the false localization rate (FLR) based on the target-decoy framework, where decoy phosphopeptides generated by placing artificial phosphorylation(s) on non-candidate residues compete with the non-decoy phosphopeptides. LuciPHOr also reports approximate site-level confidence scores for all candidate sites as a means to localize additional sites from multiphosphorylated peptides in which localization can be partially achieved. LuciPHOr is compatible with any MS/MS database search engine output processed through the Trans-Proteomic Pipeline.

Download: http://luciphor.sf.net

Key Publication:
Fermin D., Walmsley S.J., Gingras A.C., Choi H., Nesvizhskii A.I. LuciPHOr: algorithm for phosphorylation site localization with false localization rate estimation using modified target-decoy approach. Molecular and Cellular Proteomics 12 3409-19 (2013) Manuscript

ABACUS

ABACUS is a computational tool for extracting label-free quantitative information (spectral counts) from MS/MS data sets. It aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic data sets for subsequent, more sophisticated statistical analysis.

Download: http://abacustpp.sf.net

Key Publications:
Fermin D., Basrur V., Yocum A.K., Nesvizhskii A.I.. Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteomics 7:1340-45 (2011) Manuscript

QPROT

QPROT is a software for differential protein expression using MS1 and MS/MS-level continuous quantitative data. Features a hierarchical model with predictive recursive algorithm. Includes percentile normalization and multiple threading for fast computing.

Download: http://sourceforge.net/projects/qprot

QSPEC

Software for the analysis of differential protein expression using label-free spectral count data. The hierarchical model of QSPEC pools statistical information for mean and variance estimates across all proteins in the presence of limited number of replicate data. In a typical quantitative proteomics experiment, there are rarely a sufficient number of replicates to render conventional statistic-based tests such as T-test applicable. QSPEC addresses this problem and calculates the ratio of likelihoods (Bayes Factor) for differential expression for each protein based on certain model assumptions (Poisson-family distributions for count data and Gaussian distribution for intensity data).

Web server: http://www.nesvilab.org/qspec.php/
Software download: http://qspec.sourceforge.net/

Key Publication:
Choi H, Fermin D, Nesvizhskii AI. Significance analysis of spectral count data in label-free shotgun proteomics. Mol Cell Proteomics 7:2373-85 (2008). Manuscript

NESTEDCLUSTER

A biclustering method for constructing protein complexes using (filtered) high-confidence interaction data from label-free quantitative AP-MS experiment. The method forms bait clusters based on the similarity of quantitative interaction profiles as anchors of protein complexes, and identifies submatrices of prey proteins showing consistent quantitative association within the anchor bait clusters. The statistical model here determines the optimal number of bait clusters and prey clusters in the data, automatically yielding the configuration of highly probable protein complexes.

Download: http://nestedcluster.sourceforge.net/

Key Publications:
Choi H., Kim S., Gingras A.S., Nesvizhskii A.I. Analysis of Protein Complexes via Model-based Biclustering of Label-free Quantitative AP-MS Data. Mol. Syst. Biol. 6:385 (2010). Manuscript


Trans-Proteomic Pipeline

We developed core components of the widely used open source data analysis pipeline (Trans-Proteomic Pipeline, TPP) for primary processing of mass spectrometry-based proteomic data. The pipeline is currently maintained by the Seattle Proteomics Center at the Institute for Systems Biology http://www.systemsbiology.org/.

Download: http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP

Google Discussion Group: http://groups.google.com/group/spctools-discuss

Key Publications:
PeptideProphet: Keller A., Nesvizhskii A.I., Kolker E., Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 74:5383-92 (2002). Manuscript
* PeptideProphet performs statistical validation of peptide identifications from tandem mass (MS/MS) spectra. It can analyze the results of all most commonly used MS/MS database search tools, including SEQUEST, MASCOT, and X! Tandem.

ProteinProphet: Nesvizhskii A.I., Keller A., Kolker E., Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 75:4646-58 (2003). Manuscript
* ProteinProphet takes as input the list of identified peptides, assembles peptides into proteins, and calculates the probabilities and FDR at at the protein level.

iProphet: Shteynberg D., Deutsch E.W., Lam H., Eng J.K., Sun Z., Tasman N., Mendoza L., Moritz R.L., Aebersold R., Nesvizhskii A.I. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteomics. 10:M111.007690 (2011). Manuscript
* iProphet further improves upon PeptideProphet modeling via multi-level integrative framework. It also allows integration of multiple MS/MS database search tools.

Qualscore: Nesvizhskii A.I., Roos F.F., Grossmann J., Vogelzang M., Eddes J.S., Gruissem W., Baginsky S., Aebersold R. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol Cell Proteomics. 5:652-70 (2006). Manuscript
* QualScore is a computational tool for assessing the quality of MS/MS spectra. It is designed to find unassined high quality MS/MS spectra that may represent novel peptides or peptides containing post-translational modifications.

Recommended Reviews and Tutorials:
Nesvizhskii A.I., Vitek O., Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods. 4:787-97 (2007). Manuscript

Nesvizhskii A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics. 73:2092-123 (2010). Manuscript

Deutsch E.W., Mendoza L., Shteynberg D., Farrah T., Lam H., Tasman N., Sun Z., Nilsson E., Pratt B., Prazen B., Eng J.K., Martin D.B., Nesvizhskii A.I., Aebersold R. A guided tour of the Trans-Proteomic Pipeline. Proteomics. 10:1150-9 (2010). Manuscript

Nesvizhskii A.I., Aebersold R. Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics. 4:1419-40 (2005). Manuscript