Label-free quantification with FragPipe
This tutorial demonstrates label-free quantification with match-between-runs using a dataset (PRIDE/ProteomeXchange identifier PXD020556) in which HCT116 cells were treated with aspirin (acetylsalicylic acid) to investigate metabolic changes. Extracts from willows and other plants rich in acetylsalicylic acid have been used medicinally since Mesopotamian times. This tutorial will use a subset of the data to quantify the proteomes of aspirin treated and untreated (control) cells. These data were acquired with a Q Exactive HF-X.
Associated publication: Castoldi, Francesca, et al. “Autophagy-mediated metabolic effects of aspirin.” Cell death discovery 6.1 (2020): 1-17.
To get the input data, download the ‘lfq-raw.zip’ file from Dropbox and extract the files.
- Open FragPipe
- Load the data
- Load the LFQ-MBR workflow
- Fetch a sequence database
- Inspect the search and quantification settings
- Set the output location and run
- Inspect the results
When you launch FragPipe, check that MSFragger, IonQuant, and Philosopher are configured. (If you haven’t downloaded them yet, use their respective ‘Download / Update’ buttons. Please see the tutorials here and here. Python is not needed for these exercises.)
Load the data
On the ‘Workflow’ tab, drag and drop the six .raw spectral files or use the ‘Add files’ button to browse for them. We are using a subset of the full dataset with annotations shown below (the full list of file annotations can be found in the ‘Design.xls’ file).
Once you’ve added the files, you can annotate them by editing the ‘Experiment’ and ‘Bioreplicate’ fields manually or in batches with the ‘Set experiment/replicate’ button. The data type should be automatically detected as DDA.
Load the LFQ-MBR workflow
Still on the ‘Workflow’ tab, select the LFQ-MBR workflow from the dropdown menu, then click ‘Load’.
This sets all the analysis steps for a closed database search with MSFragger, rescoring with MSBooster and Percolator, protein grouping with ProteinProspector, and filtering with Philosopher, and label-free quantification with FDR-controlled match-between-runs with IonQuant.
Fetch a sequence database
On the ‘Database’ tab, click ‘Download’, which will prompt you to first choose a file location to store the database. (Alternatively, you can choose to use the database from Dropbox.
Once you’ve chosen a folder, click ‘Select directory’ to proceed to the database options. We will keep the default options (human, reviewed sequences, add common contaminants) for this dataset.
Click ‘Yes’ to download the database. When it’s finished, you should see that the
FASTA file path now points to the new database.
Inspect the search and quantification settings
On the ‘MSFragger’ tab, you can see the parameters that have been set by loading the workflow.
To save time in the search (at the expense of slightly lower sensitivity), you can optionally set ‘Calibration and Optimization’ to ‘None’ in the ‘Peak Matching’ section. We don’t need to change any other search settings for this analysis, but you could optionally add acetyl lysine as a variable modification by editing the allowed sites for +42.0106 (already included on protein N-term in the workflow) since one aim of the study was to examine changes in acetylation.
On the ‘Quant (MS1)’ tab, you can see the settings that will be used for label-free quantification. Note that IonQuant will be used and ‘Match between runs (MBR)’ is enabled. The ‘MaxLFQ’ quantification method is selected by default, and MaxLFQ values will be reported in addition to abundances calculated using the topN method.
Set the output location and run
On the ‘Run’ tab, use ‘Browse’ to make a new folder for the output files. Then click the ‘RUN’ button to start the analysis.
When the run is finished, ‘DONE’ will be printed at the end of the text in the console.
Inspect the results
Sample results can be found in the ‘lfq-results.zip’ file from Dropbox. In the output location, you will find combined reports (including the ‘MSstats.csv’ table, compatible with MSstats) as well as folders for each sample.
Inside each individual folder, a separate set of reports is created for just that sample.
A guide to output files, with descriptions of each column in the reports, can be found here.