View on GitHub

Welcome to MSFragger/Philosopher PD Nodes

The implementation of MSFragger and Philosopher (PeptideProphet) as Proteome Discoverer nodes

Introduction

MSFragger is an ultrafast database search tool for peptide identification in mass spectrometry-based proteomics. While we provide a stand-alone Graphical User interface FragPipe for running MSFragger, here we describe the implementation of MSFragger as a processing node in the Thermo Scientific Proteome Discoverer (PD) environment. We also provide PeptideProphet (via Philosopher) as part of the PD processing node, enabling downstream processing of MSFragger search results in PD using either Percolator or PeptideProphet.

The following workflows have been tested and should be fully supported with our MSFragger and PeptideProphet(Philosopher) PD nodes:

Compared to SEQUEST-HT/Percolator (using a publicly available HEK293 data set PXD001468; conventional closed search), MSFragger/PeptideProphet(Philosopher) reduced the total processing speed by more than a factor of four. For open searches, improvement in processing speed was even greater. Using MSFragger instead of SEQUEST-HT (with PeptideProphet or Percolator) also resulted in a significant increase in the number of identified proteins/peptides/PSMs.

Installation of PD nodes

The MSFragger node can be used with Thermo Scientific Proteome Discoverer versions 2.2 and 2.3 (not suitable for v2.1 or older versions).

Please follow the steps below for the installation:

Step 1. Download the latest version of MSFragger-PDv22.rar or MSFragger-PDv23.rar (depending on your version of Proteome Discoverer) from our github repository and unzip/decompress the file. Make sure the MSFragger JAR (.jar extension) file is in the same directory as the ‘ext’ folder, which contains the libraries needed for reading .raw files.

Step 2. Make sure that Proteome Discoverer is closed.

Step 3. Open the folder where Thermo Scientific Proteome Discoverer is installed. To find the folder location of Thermo Scientific Proteome Discoverer, right click on your Thermo Scientific Proteome Discoverer desktop icon and then click on “Properties”. The folder path is shown in the field called Target (as shown below).

Step 4. Copy “MSFragger-PD.dll” from the unzipped file to that folder. Delete any old versions of MSFragger-PD.dll.

Step 5. Open Thermo Scientific Proteome Discoverer, select the licensing page and click on “Scan for Missing Features”.

Step 6. Restart Thermo Scientific Proteome Discoverer and you should see MSFragger and PeptideProphet in your processing nodes (as shown in the Figure).

Step 7. Download the latest versions of MSFragger and Philosopher tools from the corresponding GitHub repositories. They are NOT included with the MSFragger-PD.dll wrapper program.

MSFragger: https://github.com/Nesvilab/MSFragger Please follow instructions for obtaining the JAR binary file of MSFragger.

Philosopher: https://github.com/Nesvilab/philosopher You will most likely need the following file: philosopher_windows_amd64.exe

NOTE: the original license agreements for MSFragger and Philosopher also apply when used within the PD environment.

Requirements

IMPORTANT: If you choose to use mzML format (instead of RAW), DO NOT use “zlib compression” during file conversion because Proteome Discoverer currently does not support the compression function. An example of parameter setting in file conversion using ProteoWizard/MSconvert is shown in the Figure.

How to use

Step 1. Select the binary files of MSFragger and Philosopher in their parameter window.

Step 2. Import your protein fasta file via “Maintain FASTA Files” in PD. Specify the database in MSFragger node.

MSFragger Philosopher

Step 3. When using PeptideProphet, please ensure “Validation Mode=Control peptide level error rate (Calculate missing q-values for PSMs)” in PeptideValidator (Consensus Step) such that the q-values are calculated during the validation.

Step 4. To run multiple processing workflows in parallel (batch mode), please:

(1) set up the value of “Max. Number of Processing Workflows in Parallel Execution” (at Adiministrator -> Configuration -> Parallel Job Execution) as 2 or above (as shown in the figure below).

(2) manually set up the RAM (GB) of MSFragger node in each workflow (as shown in the figure below). A simple way to calculate the RAM is to take the average of available RAMs depending on the number of processing workflows being operated at a time. For example, if you want to run 4 processing workflows at the same time with 64G RAM available, please set up 8GB (=64/4) RAM in the MSFragger node of each workflow. We only recommend running multiple processing workflows for small data sets. If you have a large data set, please set the value of “Max. Number of Processing Workflows in Parallel Execution” to 1, so the processing workflows will be run sequentially.

Processing/Consensus Workflows

We also provide three processing workflows and two consensus workflows for your reference, these can be found in the .rar files you downloaded here.

NOTE: Because of the minor version difference (e.g., PD v2.2.0.385 and PD v2.2.0.388), the workflows sometimes may fail even using the same PD version. The MSFragger parameter files (for closed and open searches) are thus provided along with the workflows for your reference.

Test Data

We recommend the following publicly available HEK293 data set (PXD001468) for testing, since this is what we typically use for testing as well.

Documentation

For documentation on MSFragger itself (hardware requirements, search parameters, etc.), see MSFragger Documentation Wiki page. For Philosopher(PeptideProphet).

Questions and Technical Support

Please post all questions/bug reports regarding MSFragger itself on the MSFragger GitHub page, or if more appropriate on Philsopher page.

How to Cite

If you use MSFragger in Proteome Discoverer, we ask that you cite the manuscript describing the PD node (currently in preparation), and well as the manuscripts describing the key individual components:

Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nature Methods 14:513–520 (2017). Manuscript.

Leprevost F. et al., Philosopher: a complete toolkit for shotgun proteomics data analysis. Manuscript in preparation.

For other tools developed by Nesvizhskii lab, go to our website www.nesvilab.org