Building spectral libraries for DIA analysis

FragPipe can be downloaded here. Follow the instructions on that same Releases page to launch the program. See here for help configuring FragPipe.

FragPipe has several options for building spectral libraries for DIA data analysis:

FragPipe generated libraries can be used to quantify with DIA-NN.

Skyline users may also choose to import interact-.pep.xml files into Skyline for spectral library building and further analysis of DIA experiments, see this tutorial.

Please note that only the first option (library building from DDA) currently works with timsTOF data. In this case, timsTOF DDA-PASEF raw files (.d) can be used directly for spectral library building, as well as .mgf files converted by Bruker DataAnalysis. To use .mgf, each .mgf file needs to copied out of its .d folder to a new joint folder (FragPipe cannot “see” .mgf files inside of .d folders for now).

The dataset used below for illustration was downloaded from PXD011691. It includes 10 samples analyzed using DIA (10 mouse brain tissue, with UPS proteins spiked in at varying concentration). It also includes 6 DDA runs (pool of the same 10 brain tissues, with peptides fractionated into 6 fractions) collected for building a spectrum library. Download a subset of the dataset (‘speclib-raw.zip’, containing 2 DIA and 2 DDA files in mzML format plus a sequence database) from Dropbox here to use for these demos. Example results for each workflow can also be found in this Dropbox folder.

Build a library from DDA data

  1. In the Workflow tab of FragPipe, select the ‘SpecLib’ workflow from the dropdown menu and ‘Load’.
  2. Load DDA spectral files in mzML or raw format. (In the example below, 6 DDA files corresponding to 6 fractionated peptide samples were loaded. Two of these fractions are provided in the dataset for this tutorial.)
  3. In the ‘Database’ tab, download a new database or select an existing one e.g the file 2021-05-13-decoys-UPS-reviewed-contam-UP000000589.fas downloaded from Dropbox. (In this case, a mouse database was downloaded with reviewed sequences, decoys, common contaminants, and iRT peptides. UPS protein sequences were also added manually.)

Note: Change the ‘RT calibration’ option on the ‘Spec Lib’ tab to ‘iRT’ if these peptides have been spiked-in. EasyPQP will use the ciRT option by default.

  1. On the ‘Run’ tab, choose the location to output the results and click ‘RUN’.


Build a library directly from DIA data

  1. Select the ‘DIA-Umpire_SpecLib’ workflow from the dropdown menu and ‘Load’.
  2. Load DIA spectral files in .mzML or .raw format. (In the example below, 10 DIA runs were loaded, only 2 of these are provided for this tutorial.)
  3. On the ‘Umpire’ tab, choose the appropriate settings:
    • Change ‘Max Missed Scans’ to 2 if building a library from DIA data only (slower run time but higher identification sensitivity).
    • Check ‘Remove Background’ if building a hybrid DDA+DIA library (see below) and if there are many DIA runs (fastest run time).
    • Uncheck ‘Mass Defect Filter’ if DIA data is generated on modification-enriched peptides (e.g. phospho), or if you’re interested in extended PTM searches.
  4. In the ‘Database’ tab, download or select an existing sequence database.
  5. On the ‘Run’ tab, choose the location to output the results and click ‘RUN’.

Note: If DIA-Umpire fails or is interrupted, temporary files will cause issues if the process runs again. Make sure to delete any temporary files that are generated alongside the raw/mzML files before re-running FragPipe.


Build a library from combined DDA and DIA data

This workflow is composed of two steps:

A) run just the DIA-Umpire step (DIA-Umpire workflow) to extract pseudo-MS/MS spectra from DIA data, then

B) build the library from the pseudo-MS/MS DDA files and the DDA files

  1. For part A, load the ‘DIA_Umpire’ workflow, add DIA mzML files, and adjust the DIA-Umpire parameters as needed (see above). Select the output destination and ‘RUN’. When DIA-Umpire is finished, the output folder will contain three pseudo-MS/MS (pseudo-DDA) files for each input mzML, with the suffixes _Q1.mzML, _Q2.mzML, _Q3.mzML.

  2. For part B (shown below), first load the ‘SpecLib’ workflow.
  3. Clear the DIA files from part A.
  4. Load DDA mzML files and also all pseudo-MS/MS mzML files generated by part A.
  5. On the ‘Run’ tab, click ‘RUN’.


Quantify with DIA-NN

DIA-NN is available for download here.

  1. Click ‘Raw’ and load mzML files (or RAW format if DIA-NN has been configured to read the RAW format).
  2. Select the spectral library generated using FragPipe (‘library.tsv’ file).
  3. Choose where to write the output and the file name (e.g. ‘DIA-NN_DIALib.tsv’).
  4. Specify the number of threads to use.
  5. Set ‘Protein inference’ to ‘Off’, and ‘Quantification strategy’ to ‘Robust LC (high accuracy)’ (recommended). In the FragPipe-generated spectral library, shared peptides are assigned to proteins with the most evidence (razor proteins).
  6. Click ‘Run’.


Key References

Tsou CC, Avtonomov D, Larsen B, Tucholska M, Choi H, Gingras AC, Nesvizhskii AI. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics, Nature Methods 12:258-64 (2015).

Demichev V, Yu F, Teo GC, Szyrwiel L, Rosenberger G, Decker J, Kaspar-Schoenefeld S, Lilley KS, Mülleder M, Nesvizhskii AI, Ralser M. High sensitivity dia-PASEF proteomics with DIA-NN and FragPipe, bioRxiv (2021).



Back to FragPipe homepage