This page describes the computational tools and software developed by Nesvizhskii lab. You can learn more about our tools at the following courses:

CSHL Course on Proteomics Description
Cold Spring Harbor Laboratory (CSHL), New York
Since 2013 we contribute a workshop (lecture, hands-on tutorial) on the analysis of AP-MS protein-protein interaction data

US HUPO short course: Stable and Transient Protein-Protein Interactions: Discovery, Quantification and Validation
Short course at the US HUPO Conference
Instructors: Ileana Cristea, Alexey Nesvizhskii
First offered in 2012, this course cover fundamental and practical aspects of studying protein interactions, from experimental design to data analysis.It will include a practical demonstration of the CRAPome resource.

ASMS short course: Bioinformatics for Protein Identification
Description and Registration
Instructors: Alexey Nesvizhskii, David Tabb, Nuno Bandiera
First offered in 2011, the course seeks to familiarize proteomics researchers with the inner workings of the software that enables the analysis of mass spectrometry-based proteomics data. The course includes a hands-on tutorial on the Trans-Proteomic Pipeline.


BatMass is a mass spectrometry data visualization tool. It was created to provide an extensible platform, providing basic functionality, like project management, raw mass-spectrometry data access, various GUI widgets and extension points.

Download, manual, sample data:

Key Publications:
Avtonomov DM, Raskind A, Nesvizhskii AI. BatMass: a Java Software Platform for LC-MS Data Visualization in Proteomics and Metabolomics. J Proteome Res. 15:2500-9 (2016)Manuscript


DIA-Umpire is an open source Java program for computational analysis of data independent acquisition (DIA) mass spectrometry-based proteomics data. It enables untargeted peptide and protein identification and quantitation using DIA data, and also incorporates targeted extraction to reduce the number of cases of missing quantitation.

Download, manual, sample data:

Key Publications:
Tsou CC, Avtonomov D, Larsen B, Tucholska M, Choi H, Gingras AC, and Nesvizhskii AI. DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics, Nature Methods 2015 doi:10.1038/nmeth.3255 Manuscript

Tsou CC, Tsai CF, Teo GC, Chen YJ, Nesvizhskii AI. Untargeted, spectral library-free analysis of data-independent acquisition proteomics data generated using Orbitrap mass spectrometers. Proteomics 16:2257-71 (2016). Manuscript


mapDIA is software for statistical analysis of differential expression using MS/MS fragment-level quantitative data from data independent acquisition (DIA) proteomics experiments. It offers a series of tools for essential data preprocessing, including a novel retention time-based normalization method and multiple peptide/fragment selection steps.

Download, manual, sample data:

Key Publications:
Teo G, Kim S, Tsou CC, Collins B, Gingras AC, Nesvizhskii AI, Choi H. mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry. J Proteomics129:108-20 (2015) Manuscript


Contaminant Repository for Affinity Purification (CRAPome) is a database of annotated negative controls contributed by the proteomics research community. It addresses the common problem of distinguishing real interactions from the non-specific background (also known as 'contaminants'). The database and associated computational tools to score protein interactions are available online. The intuitive web-interface can be used to explore the database and to analyze user-uploaded data. Users can:
- query one protein at a time
- download background contaminant lists for selected experimental conditions, and/or
- upload their own data and perform analysis using SAINT or SAINTexpress and empirical scoring


Key Publications:
The CRAPome: a contaminant repository for affinity purification - mass spectrometry data. D. Mellacheruvu et al. Nature Methods 10: 730-6 (2013) Manuscript


Computational models and software for assigning confidence scores to protein-protein interactions in label-free quantitative AP-MS datasets. For each observed interaction with associated label-fee quantification, SAINT calculates the probability of true interaction. The modeling incorporates various data normalization steps and is also capable of utilizing the quantittaive information from negative control purifications for improving specificity in small-to-intermediate scale experiments (SAINT v. 2). The method was initially developed for label-free spectral count data, but was later extended to MS1 intensity-based quantitative data (SAINT-MS1). SAINTexpress is a recently developed fast version of the algorithm.

SAINT and SAINTexpress can be run online using the CRAPome:

Download source code:

SAINT Tools and Publication:
SAINT v 2: Choi, H., Larsen, B., Lin., Z.-Y., Breitkreutz, A., Mellacheruvu, D., Fermin, D., Qin, Z.S., Tyers, M., Gingras, A.-C. and Nesvizhskii, A.I. Nature Methods 8:70-3 (2010) Manuscript
* This is the key reference for the SAINT series of algorithms. It introduces the semi-supervised SAINT model, which is based on comparing the spectral count distribution across the negative control runs to the counts for the same prey in the purification of the bait.

SAINT v 1: Breitkreutz, A., Choi, H., Sharon, J., Boucher, L., Neduva, V., Larsen, B.G., Lin, Z.-Y.,Breitkreutz, B.-J., Stark, C., Liu, G., Ahn, J., Dewar-Darch, D., Tang, X., Almeida, V., Qin, Z.S., Pawson, T., Gingras, A.-C, Nesvizhskii, A., Tyers, M. Global architecture of the yeast kinome interaction network, Science 328, 1043 - 104 (2010) Manuscript
* The first, unsupervised SAINT model that does not require negative controls for scoring. Designed for large scale projects that profile > 20-30 baits that share very few interactions.

SAINT-MS1: Choi H., Glatter T., Gstaiger M., Nesvizhskii A.I. SAINT-MS1: Protein-protein interaction scoring using label-free intensity data in affinity purification-mass spectrometry experiments. Journal of Proteome Research (2012) 11(4):2619-24 Manuscript
* Extension of the model and the software to continuous (e.g. intensity base) quantitative data

SAINTexpress: Teo, G., Liu, G., Zhang, J., Gingras, A.C., Choi, H. SAINTexpress: improvements and additional features in Significance Analysis of INTeractome for AP-MS data. J Proteomics in press (2013) Manuscript
* A computationally efficient version of the SAINT tool, with an optional re-scoring based on externally acquired information.

Reviews and Tutorials:
Choi, H., Liu, G., Tyers, M., Gingras, A.-C. and Nesvizhskii, A.I. (2012) Analyzing protein-protein interactions from affinity purification-mass spectrometry data with SAINT. Cur Protoc Bioinformatics, Chapter 8:Unit8.15. Manuscript
* This is a detailed protocol for the use of SAINT, which defines options (minFold, lowMode and Norm) that can be tailored to the dataset to be analyzed.

Nesvizhskii, A.I. Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments. Proteomics 12:1639-55 (2012) Manuscript
* This is a detailed review on the analysis of AP-MS data.


ProHits is a Laboratory Management System (LIMS) for interaction proteomics developed primarily by the Anne-Claude Gingras and Mike Tyers laboratories in collaboration with Nesvizhskii lab. It is a comprehensive system that integrates the TPP/iProphet for peptide/protein identification and SAINT suite of tools for interaction scoring.


Key Publications:
Liu G., Zhang J., Larsen B., Stark C., Breitkreutz A., Lin Z.Y., Breitkreutz B.J., Ding Y., Colwill K., Pasculescu A., Pawson T., Wrana J.L., Nesvizhskii A.I., Raught B., Tyers M., Gingras A.C. ProHits: integrated software for mass spectrometry-based interaction proteomics, Nat Biotech 28:1015-17 (2010). Manuscript
* This is the key manuscript that descibes the ProHits LIMS system.

Liu G., Zhang J., Choi H., Lambert J.P., Srikumar T., Larsen B., Nesvizhskii A.I., Raught B., Tyers M., Gingras A.C. Using ProHits to Store, Annotate, and Analyze Affinity Purification - Mass Spectrometry (AP-MS) Data. Current Protocols in Bioinformatics Unit 8.16 Manuscript
* This is a Protocols manuscript that describes the installation and use of ProHits system. It also introduces a simplified, virtual machine implementation (ProHits Lite software)

Liu G, Knight JD, Zhang JP, Tsou CC, Wang J, Lambert JP, Larsen B, Tyers M, Raught B, Bandeira N, Nesvizhskii AI, Choi H, Gingras AC. Data Independent Acquisition analysis in ProHits 4.0. J Proteomics (2016). Manuscript
* New version that incorporates DIA-Umpire and mapDIA tools


Luciphor is a program for localizing the sites of post-translational modifications (e.g. phosphorylation)on peptide sequences. LuciPHOr carries out simultaneous localization on all candidate sites in each peptide and estimates the false localization rate (FLR) based on the target-decoy framework, where decoy phosphopeptides generated by placing artificial phosphorylation(s) on non-candidate residues compete with the non-decoy phosphopeptides. LuciPHOr also reports approximate site-level confidence scores for all candidate sites as a means to localize additional sites from multiphosphorylated peptides in which localization can be partially achieved. LuciPHOr is compatible with any MS/MS database search engine output processed through the Trans-Proteomic Pipeline.


Key Publication:
Fermin D., Walmsley S.J., Gingras A.C., Choi H., Nesvizhskii A.I. LuciPHOr: algorithm for phosphorylation site localization with false localization rate estimation using modified target-decoy approach. Molecular and Cellular Proteomics 12 3409-19 (2013) Manuscript


Luciphor2 re-implements the original Luciphor algorithm 9see above) in JAVA and expands it to work on any post-translational modification. Luciphor2 has several features over the previous version:
It can run on any computer that uses JAVA
It can score any PTM
It can score results from any search tool
Like the original Luciphor, this release can process PeptideProphet XML files (pepXML). It can also read in tab-delimited files with scores from any protein search tool.


Key Publication:
Fermin D., Avtonomov D., Choi H., Nesvizhskii A.I. LuciPHOr2: site localization of generic post-translational modifications from tandem mass spectrometry data Bioinformatics 2014 [Epub ahead of print] PubMed PMID: 25429062 Manuscript


ABACUS is a computational tool for extracting label-free quantitative information (spectral counts) from MS/MS data sets. It aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic data sets for subsequent, more sophisticated statistical analysis.


Key Publications:
Fermin D., Basrur V., Yocum A.K., Nesvizhskii A.I.. Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteomics 7:1340-45 (2011) Manuscript


QPROT is a software for differential protein expression using MS1 and MS/MS-level continuous quantitative data. Features a hierarchical model with predictive recursive algorithm. Includes percentile normalization and multiple threading for fast computing.


Key Publication:
Choi H, Kim S, Fermin D, Tsou CC, Nesvizhskii AI. QPROT: Statistical method for testing differential expression using protein-level intensity data in label-free quantitative proteomics. J Proteomics 129:121-6 (2015) Manuscript


Software for the analysis of differential protein expression using label-free spectral count data. The hierarchical model of QSPEC pools statistical information for mean and variance estimates across all proteins in the presence of limited number of replicate data. In a typical quantitative proteomics experiment, there are rarely a sufficient number of replicates to render conventional statistic-based tests such as T-test applicable. QSPEC addresses this problem and calculates the ratio of likelihoods (Bayes Factor) for differential expression for each protein based on certain model assumptions (Poisson-family distributions for count data and Gaussian distribution for intensity data).

Web server:
Software download:

Key Publication:
Choi H, Fermin D, Nesvizhskii AI. Significance analysis of spectral count data in label-free shotgun proteomics. Mol Cell Proteomics 7:2373-85 (2008). Manuscript


A biclustering method for constructing protein complexes using (filtered) high-confidence interaction data from label-free quantitative AP-MS experiment. The method forms bait clusters based on the similarity of quantitative interaction profiles as anchors of protein complexes, and identifies submatrices of prey proteins showing consistent quantitative association within the anchor bait clusters. The statistical model here determines the optimal number of bait clusters and prey clusters in the data, automatically yielding the configuration of highly probable protein complexes.


Key Publications:
Choi H., Kim S., Gingras A.S., Nesvizhskii A.I. Analysis of Protein Complexes via Model-based Biclustering of Label-free Quantitative AP-MS Data. Mol. Syst. Biol. 6:385 (2010). Manuscript

Trans-Proteomic Pipeline

We developed core components of the widely used open source data analysis pipeline (Trans-Proteomic Pipeline, TPP) for primary processing of mass spectrometry-based proteomic data. The pipeline is currently maintained by the Seattle Proteomics Center at the Institute for Systems Biology


Google Discussion Group:

Key Publications:
PeptideProphet: Keller A., Nesvizhskii A.I., Kolker E., Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 74:5383-92 (2002). Manuscript
* PeptideProphet performs statistical validation of peptide identifications from tandem mass (MS/MS) spectra. It can analyze the results of all most commonly used MS/MS database search tools, including SEQUEST, MASCOT, and X! Tandem.

ProteinProphet: Nesvizhskii A.I., Keller A., Kolker E., Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 75:4646-58 (2003). Manuscript
* ProteinProphet takes as input the list of identified peptides, assembles peptides into proteins, and calculates the probabilities and FDR at at the protein level.

iProphet: Shteynberg D., Deutsch E.W., Lam H., Eng J.K., Sun Z., Tasman N., Mendoza L., Moritz R.L., Aebersold R., Nesvizhskii A.I. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteomics. 10:M111.007690 (2011). Manuscript
* iProphet further improves upon PeptideProphet modeling via multi-level integrative framework. It also allows integration of multiple MS/MS database search tools.

Qualscore: Nesvizhskii A.I., Roos F.F., Grossmann J., Vogelzang M., Eddes J.S., Gruissem W., Baginsky S., Aebersold R. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol Cell Proteomics. 5:652-70 (2006). Manuscript
* QualScore is a computational tool for assessing the quality of MS/MS spectra. It is designed to find unassined high quality MS/MS spectra that may represent novel peptides or peptides containing post-translational modifications.

Recommended Reviews and Tutorials:
Nesvizhskii A.I., Vitek O., Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods. 4:787-97 (2007). Manuscript

Nesvizhskii A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics. 73:2092-123 (2010). Manuscript

Deutsch E.W., Mendoza L., Shteynberg D., Farrah T., Lam H., Tasman N., Sun Z., Nilsson E., Pratt B., Prazen B., Eng J.K., Martin D.B., Nesvizhskii A.I., Aebersold R. A guided tour of the Trans-Proteomic Pipeline. Proteomics. 10:1150-9 (2010). Manuscript

Nesvizhskii A.I., Aebersold R. Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics. 4:1419-40 (2005). Manuscript