MSFragger is an ultrafast database search tool for peptide identifications in mass spectrometry-based proteomics. While we provide a stand-alone Graphical User interface FragPipe for running MSFragger, here we describe the implementation of MSFragger as a processing node in the Thermo Scientific Proteome Discoverer (PD) environment. We also provide PeptideProphet (via Philosopher) as part of the PD processing node, enabling downstream processing of MSFragger search results in PD using either Percolator or PeptideProphet.
The following workflows have been tested and should be fully supported with our MSFragger and PeptideProphet(Philosopher) PD nodes:
- Conventional closed search, label-free or labeling based (e.g. TMT)
- Open search
Compared to SEQUEST-HT/Percolator (using a publicly available HEK293 data set PXD001468; conventional closed search), MSFragger/PeptideProphet(Philosopher) reduced the total processing speed by at least a factor of four. For open searches, the improvement in the processing speed was much more significant. Using MSFragger instead of SEQUEST-HT (with PeptideProphet or Percolator) also resulted in a significant increase in the number of identified proteins/peptide/PSMs.
Installation of PD nodes
The MSFragger node can be used with Thermo Scientific Proteome Discoverer versions 2.2 and 2.3 (not suitable for v2.1 or older versions).
Please follow the steps below for the installation:
Step 1. Download the latest version of MSFragger-PDv22.rar or MSFragger-PDv23.rar (depending on your version of Proteome Discoverer) from our github repository and unzip/decompress the file. Make sure the MSFragger JAR (.jar extension) file is in the same directory as the ‘ext’ folder, which contains the libraries needed for reading raw files directly.
Step 2. Make sure that Thermo Scientific Proteome Discoverer on your computer is closed.
Step 3. Open the folder where Thermo Scientific Proteome Discoverer is installed. To find the folder location of Thermo Scientific Proteome Discoverer, please right click on your Thermo Scientific Proteome Discoverer desktop icon and then click on “Properties”. The folder path is shown in the field called Target (as shown in the Figure).
Step 4. Copy “MSFragger-PD.dll” from the unzipped file to that folder. Please make sure to delete any old versions of MSFragger-PD.dll.
Step 5. Open Thermo Scientific Proteome Discoverer, select the licensing page and click on “Scan for Missing Features”.
Step 6. Restart Thermo Scientific Proteome Discoverer and you should see MSFragger and PeptideProphet in your processing nodes (as shown in the Figure).
Step 7. Download the latest versions of MSFragger and Philosopher tools from the corresponding GitHub repositories. They are NOT included with the MSFragger-PD.dll wrapper program.
MSFragger: https://github.com/Nesvilab/MSFragger Please follow instructions for obtaining the JAR binary file of MSFragger.
Philosopher: https://github.com/Nesvilab/philosopher You will most likely need the following file: philosopher_windows_amd64.exe
NOTE: the original license agreements for MSFragger and Philosopher also apply when used within the PD environment.
Java SE Runtime Environment 8 (or above) is required to be installed prior to use MSFragger-PD node.
Input files can be RAW, mzML, mzXML formats.
IMPORTANT: If you choose to use mzML format (instead of RAW), please DO NOT use “zlib compression” during file conversion because Proteome Discoverer currently does not support the compression function. An example of parameter setting in file conversion using ProteoWizard/MSconvert is shown in the Figure.
- For searches with multiple PTMs (e.g. phosphorylation) or non-specific digestion searches, MSFragger requires a significant amount of RAM available. For such searches, we recommend at least 32Gb of memory, ideally 64Gb or more. For normal tryptic searches, or open searches, even 16Gb should be sufficient. If you would like to perform searches that require significant amount of RAM, we recommend that you use FragPipe instead. FragPipe provides an option for splitting the protein sequence database, thus circumventing the memory limitations.
How to use
Step 1. Select the binary files of MSFragger and Philosopher in their parameter window.
Step 2. Import your protein fasta file via “Maintain FASTA Files” in PD. Specify the database in MSFragger node.
Step 3. When using PeptideProphet, please ensure “Validation Mode=Control peptide level error rate (Calculate missing q-values for PSMs)” in PeptideValidator (Consensus Step) such that the q-values are calculated during the validation.
NOTE: Our MSFragger-PD node currently only supports running one processing workflow (allowing mutiple input files) at a time.
We also provides three processing workflows and two consensus workflows for your reference, these can be found in the .rar files you downloaded here.
- Processing_MSFragger_Percolator_ClosedSearch.pdProcessingWF: Use MSFragger and Percolator for conventional closed search.
- Processing_MSFragger_PeptideProphet_ClosedSearch.pdProcessingWF: Use MSFragger and PeptideProphet for conventional closed search.
- Processing_MSFragger_PeptideProphet_OpenSearch.pdProcessingWF: Use MSFragger and PeptideProphet for open search.
- Consensus_Percolator.pdConsensusWF: The consensus workflow for MSFragger and Percolator.
- Consensus_PeptideProphet.pdConsensusWF: The consensus workflor for PeptideProphet.
NOTE: Because of the minor version difference (e.g., PD v22.214.171.1245 and PD v126.96.36.1998), the workflows sometimes may fail to be used even using the same PD version. The MSFragger parameter files (for closed and open searches) are thus provided along with the workflows for your reference.
We recommend the following publicly available HEK293 data set (PXD001468) for testing, since this is what we typically use for testing as well.
For documentation on MSFragger itself (hardware requirements, search parameters, etc.), see MSFragger Documentation Wiki page. For Philosopher(PeptideProphet).
Questions and Technical Support
How to Cite
If you use MSFragger in Proteome Discoverer, we ask that you cite the manuscript describing the PD node (currently in preparation), and well as the manuscripts describing the key individual components:
Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nature Methods 14:513–520 (2017). Manuscript.
Leprevost F. et al., Philosopher: a complete toolkit for shotgun proteomics data analysis. Manuscript in preparation.
For other tools developed by Nesvizhskii lab, go to our website www.nesvilab.org