Computational models and software for assigning confidence scores to protein-protein interactions in label-free quantitative AP-MS datasets. For each
observed interaction with associated label-fee quantification, SAINT calculates the probability of true interaction. The modeling incorporates various data normalization
steps and is also capable of utilizing the quantittaive information from negative control purifications for improving specificity in small-to-intermediate scale experiments
(SAINT v. 2). The method was initially developed for label-free spectral count data, but was later extended to MS1 intensity-based quantitative data (SAINT-MS1). SAINTexpress is a recently developed fast version of the algorithm.
SAINT and SAINTexpress can be run online using the CRAPome: www.crapome.org
Download source code: http://saint-apms.sourceforge.net/
SAINT Tools and Publication:
SAINT v 2: Choi, H., Larsen, B., Lin., Z.-Y., Breitkreutz, A., Mellacheruvu, D., Fermin, D., Qin, Z.S., Tyers, M., Gingras, A.-C. and Nesvizhskii, A.I. Nature Methods 8:70-3 (2010) Manuscript
* This is the key reference for the SAINT series of algorithms. It introduces the semi-supervised SAINT model, which is based on comparing the spectral count distribution across the negative control runs to the counts for the same prey in the purification of the bait.
SAINT v 1: Breitkreutz, A., Choi, H., Sharon, J., Boucher, L., Neduva, V., Larsen, B.G., Lin, Z.-Y.,Breitkreutz, B.-J., Stark, C., Liu, G., Ahn, J., Dewar-Darch, D., Tang, X., Almeida, V., Qin, Z.S., Pawson, T., Gingras, A.-C, Nesvizhskii, A., Tyers, M. Global architecture of the yeast kinome interaction network, Science 328, 1043 - 104 (2010) Manuscript
* The first, unsupervised SAINT model that does not require negative controls for scoring. Designed for large scale projects that profile > 20-30 baits that share very few interactions.
SAINT-MS1: Choi H., Glatter T., Gstaiger M., Nesvizhskii A.I. SAINT-MS1: Protein-protein interaction scoring using label-free intensity data in affinity purification-mass spectrometry experiments. Journal of Proteome Research (2012) 11(4):2619-24 Manuscript
* Extension of the model and the software to continuous (e.g. intensity base) quantitative data
SAINTexpress: Teo, G., Liu, G., Zhang, J., Gingras, A.C., Choi, H. SAINTexpress: improvements and additional features in Significance Analysis of INTeractome for AP-MS data. J Proteomics in press (2013) Manuscript * A computationally efficient version of the SAINT tool, with an optional re-scoring based on externally acquired information.
Reviews and Tutorials:
Choi, H., Liu, G., Tyers, M., Gingras, A.-C. and Nesvizhskii, A.I. (2012) Analyzing protein-protein interactions from affinity purification-mass spectrometry data with SAINT. Cur Protoc Bioinformatics, Chapter 8:Unit8.15. Manuscript * This is a detailed protocol for the use of SAINT, which defines options (minFold, lowMode and Norm) that can be tailored to the dataset to be analyzed.
Nesvizhskii, A.I. Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments. Proteomics 12:1639-55 (2012) Manuscript * This is a detailed review on the analysis of AP-MS data.
ProHits is a Laboratory Management System (LIMS) for interaction proteomics developed primarily by the Anne-Claude Gingras and Mike Tyers laboratories in collaboration with
Nesvizhskii lab. It is a comprehensive system that integrates the TPP/iProphet for peptide/protein identification and SAINT suite of tools for interaction scoring.
Liu G., Zhang J., Larsen B., Stark C., Breitkreutz A., Lin Z.Y., Breitkreutz B.J., Ding Y., Colwill K., Pasculescu A., Pawson T., Wrana J.L., Nesvizhskii A.I., Raught B., Tyers M., Gingras A.C. ProHits: integrated software for mass spectrometry-based interaction proteomics, Nat Biotech 28:1015-17 (2010). Manuscript
* This is the key manuscript that descibes the ProHits LIMS system.
Liu G., Zhang J., Choi H., Lambert J.P., Srikumar T., Larsen B., Nesvizhskii A.I., Raught B., Tyers M., Gingras A.C. Using ProHits to Store, Annotate, and Analyze Affinity Purification - Mass Spectrometry (AP-MS) Data. Current Protocols in Bioinformatics Unit 8.16 Manuscript
* This is a Protocols manuscript that describes the installation and use of ProHits system. It also introduces a simplified, virtual machine implementation (ProHits Lite software)
Luciphor is a program for localizing the sites of post-translational modifications (e.g. phosphorylation)on peptide sequences. LuciPHOr carries out simultaneous localization on all candidate sites in each peptide and
estimates the false localization rate (FLR) based on the target-decoy framework, where decoy phosphopeptides generated by placing artificial phosphorylation(s) on non-candidate residues compete with the non-decoy
phosphopeptides. LuciPHOr also reports approximate site-level confidence scores for all candidate sites as a means to localize additional sites from multiphosphorylated peptides in which localization can be partially achieved.
LuciPHOr is compatible with any MS/MS database search engine output processed through the Trans-Proteomic Pipeline.
Fermin D., Walmsley S.J., Gingras A.C., Choi H., Nesvizhskii A.I. LuciPHOr: algorithm for phosphorylation site localization with false localization rate estimation using modified target-decoy approach. Molecular and Cellular Proteomics 12 3409-19 (2013) Manuscript
Luciphor2 re-implements the original Luciphor algorithm 9see above) in JAVA and expands it to work on any post-translational modification.
Luciphor2 has several features over the previous version:
It can run on any computer that uses JAVA
It can score any PTM
It can score results from any search tool
Like the original Luciphor, this release can process PeptideProphet XML files (pepXML). It can also read in tab-delimited files with scores from any protein search tool.
Fermin D., Avtonomov D., Choi H., Nesvizhskii A.I. LuciPHOr2: site localization of generic post-translational modifications from tandem mass spectrometry data Bioinformatics 2014 [Epub ahead of print] PubMed PMID: 25429062 Manuscript
ABACUS is a computational tool for extracting label-free quantitative information (spectral counts) from MS/MS data sets. It aggregates data from multiple experiments,
adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression
data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline
the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic data sets for subsequent, more sophisticated statistical analysis.
Fermin D., Basrur V., Yocum A.K., Nesvizhskii A.I.. Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteomics 7:1340-45 (2011) Manuscript
QPROT is a software for differential protein expression using MS1 and MS/MS-level continuous quantitative data. Features a
hierarchical model with predictive recursive algorithm. Includes percentile normalization and multiple threading for fast computing.
Software for the analysis of differential protein expression using label-free spectral count data. The hierarchical model of QSPEC pools statistical information for mean and variance estimates across all proteins in the presence of limited number of replicate data. In a typical quantitative proteomics experiment, there are rarely a sufficient number of replicates to render conventional statistic-based tests such as T-test applicable. QSPEC addresses this problem and calculates the ratio of likelihoods (Bayes Factor) for differential expression for each protein based on certain model assumptions (Poisson-family distributions for count data and Gaussian distribution for intensity data).
Web server: http://www.nesvilab.org/qspec.php/
Software download: http://qspec.sourceforge.net/
Choi H, Fermin D, Nesvizhskii AI. Significance analysis of spectral count data in label-free shotgun proteomics. Mol Cell Proteomics 7:2373-85 (2008). Manuscript
A biclustering method for constructing protein complexes using (filtered) high-confidence interaction data from label-free quantitative AP-MS experiment. The method forms bait clusters based on the similarity of quantitative interaction profiles as anchors of protein complexes, and identifies submatrices of prey proteins showing consistent quantitative association within the anchor bait clusters. The statistical model here determines the optimal number of bait clusters and prey clusters in the data, automatically yielding the configuration of highly probable protein complexes.
Choi H., Kim S., Gingras A.S., Nesvizhskii A.I. Analysis of Protein Complexes via Model-based Biclustering of Label-free Quantitative AP-MS Data. Mol. Syst. Biol. 6:385 (2010). Manuscript
We developed core components of the widely used open source data analysis pipeline (Trans-Proteomic Pipeline, TPP) for primary processing of mass spectrometry-based proteomic data.
The pipeline is currently maintained by the Seattle Proteomics Center at the Institute for Systems Biology http://www.systemsbiology.org/.
Google Discussion Group: http://groups.google.com/group/spctools-discuss
PeptideProphet: Keller A., Nesvizhskii A.I., Kolker E., Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 74:5383-92 (2002). Manuscript
* PeptideProphet performs statistical validation of peptide identifications from tandem mass (MS/MS) spectra. It can analyze the results of all most commonly used MS/MS database search tools, including SEQUEST, MASCOT, and X! Tandem.
ProteinProphet: Nesvizhskii A.I., Keller A., Kolker E., Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 75:4646-58 (2003). Manuscript
* ProteinProphet takes as input the list of identified peptides, assembles peptides into proteins, and calculates the probabilities and FDR at at the protein level.
iProphet: Shteynberg D., Deutsch E.W., Lam H., Eng J.K., Sun Z., Tasman N., Mendoza L., Moritz R.L., Aebersold R., Nesvizhskii A.I. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteomics. 10:M111.007690 (2011). Manuscript
* iProphet further improves upon PeptideProphet modeling via multi-level integrative framework. It also allows integration of multiple MS/MS database search tools.
Qualscore: Nesvizhskii A.I., Roos F.F., Grossmann J., Vogelzang M., Eddes J.S., Gruissem W., Baginsky S., Aebersold R. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol Cell Proteomics. 5:652-70 (2006). Manuscript
* QualScore is a computational tool for assessing the quality of MS/MS spectra. It is designed to find unassined high quality MS/MS spectra that may represent novel peptides or peptides containing post-translational modifications.
Recommended Reviews and Tutorials:
Nesvizhskii A.I., Vitek O., Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods. 4:787-97 (2007). Manuscript
Nesvizhskii A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics. 73:2092-123 (2010). Manuscript
Deutsch E.W., Mendoza L., Shteynberg D., Farrah T., Lam H., Tasman N., Sun Z., Nilsson E., Pratt B., Prazen B., Eng J.K., Martin D.B., Nesvizhskii A.I., Aebersold R. A guided tour of the Trans-Proteomic Pipeline. Proteomics. 10:1150-9 (2010). Manuscript
Nesvizhskii A.I., Aebersold R. Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics. 4:1419-40 (2005). Manuscript