Guide to FragPipe results
Output files will depend on the workflow used and how experiments/groups are set on the ‘Workflow’ tab. This page lists the different output files generated by FragPipe. Main report files are either comma-separated (csv) or tab-separated (tsv), and their column contents are described here. Outputs that take the name of the input LC-MS file are shown with the generic ‘filename’ placeholder here.
Log files are saved automatically (with a timestamp) if an analysis finishes successfully, but they can also be exported (with the ‘Export Log’ button on the ‘Run’ tab) to help with troubleshooting if an analysis fails.
FragPipe also uses various configuration and intermediate files listed here.
Also see our guide to using FragPipe.
Main report files
- psm.tsv (from Philosopher, updated by PTM-Shepherd and IonQuant)
- ion.tsv (from Philosopher, overwritten by IonQuant)
- peptide.tsv (from Philosopher, overwritten by IonQuant)
- protein.tsv (from Philosopher, overwritten by IonQuant)
- combined_ion.tsv (from Philosopher, overwritten by IonQuant)
- combined_modified_peptide.tsv (from IonQuant)
- combined_peptide.tsv (from Philosopher, overwritten by IonQuant)
- combined_protein.tsv (from Philosopher, overwritten by IonQuant)
- diann-output files (see DIA-NN documentation)
PTM-Shepherd reports
PTM-Shepherd reports on modification profiles and diagnostic ions (if enabled) are found in the ‘ptm-shepherd-output’ folder. If glycan assignment is used, PTM-Shepherd will write assigned glycan information directly to the psm.tsv table.
TMT/iTRAQ reports
Isobaric labeling reports are found in the ‘tmt-report’ folder, generated by TMT-Integrator. Two sets of isobaric labeling quantification reports are generated, one set with abundances and one with ratios. Files containing ‘ratio’ in the file name report ratio to the reference/bridge channel in each plex if specified, or a ratio to the average abundance within each plex (virtual reference approach). Files containing ‘abundance’ are generated by converting ratio tables back to the intensity (ion abundance) scale.
SILAC/dimethyl reports
Quantification results from MS1-based isotopic labeling experiments are generated by IonQuant and reported at the peptide ion, peptide, and protein levels.
Other reports
- protein.fas - FASTA file containing the FDR-filtered protein sequences identified in the analysis, generated by Philosopher
- MSstats.csv - protein abundance report formatted for use with MSstats (see the tutorial), generated by IonQuant
- reprint.int.tsv - input file for the Resource for Evaluation of Protein Interaction Networks (REPRINT) containing protein intensities, generated by Philosopher
- reprint.spc.tsv - input file for the Resource for Evaluation of Protein Interaction Networks (REPRINT) containing protein spectral counts, generated by Philosopher
- library.tsv - spectral library, generated by either EasyPQP (default) or SpectraST
Log files
- filter.log - FDR filtering-specific portion of the log generated by Philosopher, shows the number of PSMs, ions, peptides, and proteins passing the cutoff
- log_[timestamp].txt - complete log of the FragPipe analysis
Configuration files
- fragger.params - configuration file for MSFragger search parameters
- fragpipe_[timestamp].config - configuration file for the entire FragPipe analysis
- shepherd.config - parameter file for PTM-Shepherd
- annotation.txt - annotation file containing labels for TMT/iTRAQ channels
- tmt-integrator-conf.yml - configuration file for TMT-Integrator
Intermediate files
- [filename].pepXML - peptide-spectrum matches from the database search with MSFragger
- [filename]_c.pepXML - pepXML file after curation by Crystal-C
- [filename].pin - peptide-spectrum matches from the database search with MSFragger, formatted for validation with Percolator
- interact-[filename].pep.xml - peptide-spectrum matches with validation information generated by PeptideProphet via Philosopher or by Percolator
- interact-[filename].mod.pep.xml - peptide-spectrum matches with validation information generated by PeptideProphet (or Percolator) and localization information generated by PTMProphet via Philosopher
- combined.prot.xml - protein identifications with validation information generated by ProteinProphet via Philosopher
- filelist_proteinprophet.txt - list of interact.pep.xml files to be passed to ProteinProphet, gets around Windows command length limitations for very large experiments
- filelist_ionquant.txt - similar list of files to be passed to IonQuant
psm.tsv
psm.tsv
files contain FDR-filtered search results, where each row contains a peptide-spectrum match (PSM). A separate psm.tsv file will be generated for each experiment specified on the ‘Workflow’ tab. Contents of each column are listed below.
Spectrum MS/MS spectrum identifier, follows the format (file name).(scan #).(scan #).(charge)
Spectrum File name of originating identification file
Peptide peptide amino acid sequence, any modifications not included (‘stripped’ peptide sequence)
Modified Peptide peptide sequence including modifications, modified residues are followed by brackets containing the integer mass (in Da) of the residue plus the modification; blank if peptide is unmodified
Prev AA residue preceding the identified peptide within the mapped protein sequence; -
if none
Next AA residue following the identified peptide within the mapped protein sequence; -
if none
Peptide Length number of residues in the peptide sequence
Charge charge state of the identified peptide
Retention MS2 scan’s precursor retention time (in seconds)
Observed Mass mass of the identified peptide (in Da)
Calibrated Observed Mass mass of the identified peptide after m/z calibration (in Da)
Observed M/Z mass-to-charge ratio of the peptide ion
Calibrated Observed M/Z mass-to-charge ratio of the peptide ion after m/z calibration
Calculated Peptide Mass theoretical peptide mass based on identified sequence and modifications
Calculated M/Z theoretical peptide mass-to-charge ratio based on identified sequence and modifications
Delta Mass difference between calibrated observed peptide mass and calculated peptide mass (in Da)
Expectation expectation value from statistical modeling with PeptideProphet, lower values indicate higher likelihood
Hyperscore similarity score between observed and theoretical spectra, higher values indicate greater similarity
Nextscore similarity score (hyperscore) of second-highest scoring match for the spectrum
PeptideProphet Probability confidence score determined by PeptideProphet, higher values indicate greater confidence
Number of Enzymatic Termini 2 = fully-enzymatic, 1 = semi-enzymatic, 0 = non-enzymatic
Number of Missed Cleavages number of potential enzymatic cleavage sites within the identified sequence
Protein Start starting position of the identified peptide within the protein sequence
Protein End ending position of the identified peptide within the protein sequence
Intensity precursor abundance (area under the curve) for each PSM if IonQuant is used; or maximum MS1 peak intensity within the retention time tolerance if Philosopher freequant is used (not recommanded)
Ion Mobility TIMS transit time of the precursor ion (1/K0)
Assigned Modifications variable modifications (listed by mass in Da) with modified residue and location within the peptide
Observed Modifications modifications from Delta Mass values as mapped to Unimod entries. Assigned glycan composition will be placed here if glycan composition assignment is performed (PTM-Shepherd).
MSFragger Localization MSFragger-determined localization for open/offset searches, if using localize_delta_mass. Lower case letter(s) indicate localized site(s). More than one lower case letter indicates ambiguous localization. If all letters are upper case, the unlocalized candidate got a higher score and no localization information is known.
Best Score with Delta Mass highest observed hyperscore when the Delta Mass is placed on the theoretical spectrum (from open/offset search)
Best Score without Delta Mass highest hyperscore observed without placing the Delta Mass on the theoretical spectrum (from open/offset search)
[modified residue]:[modification mass] localization probabilities for each residue/mass pair provided to PTMProphet, where localization probability of each site (closer to 1 = more confident) is denoted in parentheses following the site (e.g., GS(0.101)DRT(0.899)PER
in the column
STY:79.9663
, where phosphorylation probability is higher at T5 than S2); probabilities will add up to the number of modified sites
Glycan Score (only present if glycan composition assignment was performed in PTM-Shepherd). Score assigned to the glycan composition. Higher is better.
Glycan q-value (only present if glycan composition assignment was performed in PTM-Shepherd). Q-value for the glycan composition assignment from the glycan FDR calculation. NOTE: all PSMs that pass peptide FDR are reported, even if the glycan FDR is not passed. Filter this column to q-value less than 0.01 for a 1% glycan FDR (for example).
O-Pair Score (only present if O-Pair was run). Score from O-Pair localization. Higher is better.
Number of Glycans (only present if O-Pair was run). Plausible number of total glycans assigned by O-Pair.
Total Glycan Composition (only present if O-Pair was run). Plausible total (summed) glycan composition for all glycans assigned by O-Pair. N=HexNAc, H=Hex, A=NeuAc, G=NeuGc, F=Fuc
Glycan Site Composition(s) (only present if O-Pair was run). Glycan compositions at each site assigned by O-Pair (in order from N-terminal to C-terminal).
Confidence Level (only present if O-Pair was run). O-Pair localization confidence level. 1=All glycans localized with spectral evidence, 1b=All glycans localized, but by process of elimination (not all with evidence). 2=Some glycans localized but not all. 3=No glycans localized
Site Probabilities (only present if O-Pair was run). O-Pair localization probabilities for each localized site. Format is “[Residue number, glycan composition, probability]”
138/144 Ratio (only present if O-Pair was run). Ratio of oxonium ions detected at m/z 138 to 144 for distinguishing GlcNAc and GalNAc glycans
Has N-Glyc Sequon (only present if O-Pair was run). Whether the N-X-S/T sequon was detected in the peptide sequence.
Paired Scan Num (only present if O-Pair was run). The scan number of the paired scan used for O-glycan localization in O-Pair. The scan number reported in the Spectrum column of the table is for the collisional activation scan (and all information in the row is for the collisional scan, except the O-glycan localization from O-pair, which comes from the paired scan listed here).
[modified residue]:[modification mass] Best Localization highest observed localization probability (from PTMProphet for this modification within the peptide
Purity proportion of total ion abundance in the inclusion window from the precursor (including precursor isotopic peaks), from Philosopher freequant
Is Unique whether the identified sequence maps to a single identified protein (FALSE if shared between multiple proteins identified in the experiment)
Protein protein sequence header corresponding to the identified peptide sequence; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID protein identifier (primary accession number) for the selected protein
Entry Name entry name for the selected protein
Gene gene name for the selected protein
Protein Description name of the selected protein
Mapped Genes additional genes the identified peptide may originate from
Mapped Proteins additional proteins the identified peptide maps to (including any arising from I/L substitutions)
(additional columns for TMT/iTRAQ channels if used, where each contains the relative reporter ion abundances for that PSM)
ion.tsv
ion.tsv
files contain FDR-filtered search results, where each row contains a peptide sequence with a certain charge and modification state. PSMs are collapsed into a single ion. A separate ion.tsv file will be generated for each experiment specified on the ‘Workflow’ tab. Contents of each column are listed below.
Peptide Sequence peptide amino acid sequence, any modifications not included (‘stripped’ peptide sequence)
Modified Sequence peptide sequence including modifications, modified residues are followed by brackets containing the integer mass (in Da) of the residue plus the modification; blank if peptide is unmodified
Prev AA residue preceding the identified peptide within the mapped protein sequence; -
if none
Next AA residue following the identified peptide within the mapped protein sequence; -
if none
Peptide Length number of residues in the peptide sequence
M/Z calculated (theoretical) peptide mass-to-charge ratio based on identified sequence and modifications
Charge peptide ion charge state
Observed Mass calculated mass of the identified peptide (in Da)
Probability confidence score determined by PeptideProphet, higher values indicate greater confidence
Expectation expectation value from statistical modeling with PeptideProphet, lower values indicate higher likelihood
Spectral Count number of corresponding PSMs
Intensity maximum intensity from all observed PSMs for the ion
Assigned Modifications variable modifications (listed by modification mass in Da) with modified residue and location within the peptide
Observed Modifications for peptides identified with non-zero delta masses (from open or mass offset searches), modifications mapping to a Unimod entry of the corresponding delta mass are listed here
Protein protein sequence header corresponding to the identified peptide sequence; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID UniProt protein identifier (primary accession number)
Entry Name entry name for the selected protein
Gene gene name for the selected protein
Protein Description name of the selected protein
Mapped Genes additional genes the identified peptide may originate from (including any arising from I/L substitutions)
Mapped Proteins additional proteins the identified peptide maps to (including any arising from I/L substitutions)
(additional columns for TMT/iTRAQ channels if applicable, each contains relative reporter ion abundances)
peptide.tsv
peptide.tsv
files contain FDR-filtered search results, where each row is an identified peptide sequence. Ions are collapse into a single peptide. A separate peptide.tsv file will be generated for each experiment specified on the ‘Workflow’ tab. Contents of each column are listed below.
Peptide peptide amino acid sequence, no modifications included (‘stripped’ peptide sequence)
Prev AA residue preceding the identified peptide within the mapped protein sequence; -
if none
Next AA residue following the identified peptide within the mapped protein sequence; -
if none
Peptide Length number of residues in the peptide sequence
Charges peptide ion charge state(s)
Probability confidence score determined by PeptideProphet, higher values indicate greater confidence
Spectral Count number of corresponding PSMs
Intensity summed intensity of the top 3 most abundant ions for the peptide
Assigned Modifications variable modifications (listed by mass in Da) with modified residue and location within the peptide
Observed Modifications for peptides identified with non-zero delta masses (from open or mass offset searches), modifications mapping to a Unimod entry of the corresponding delta mass are listed here
Protein protein sequence header corresponding to the identified peptide sequence; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID protein identifier (primary accession number) for the selected protein
Entry Name entry name for the selected protein
Gene gene name for the selected protein
Protein Description name of the selected protein
Mapped Genes additional genes the identified peptide may originate from (including any arising from I/L substitutions)
Mapped Proteins additional proteins the identified peptide maps to (including any arising from I/L substitutions)
(additional columns for TMT/iTRAQ channels if applicable, each contains relative reporter ion abundances)
protein.tsv
protein.tsv
files contain FDR-filtered protein results, where each row is an identified protein group. A separate protein.tsv file will be generated for each experiment specified on the ‘Workflow’ tab. Contents of each column are listed below.
Group protein group number
SubGroup protein subgroup identifier
Protein protein sequence header
Protein ID UniProt protein identifier (primary accession number)
Entry Name protein entry name
Gene gene name
Length number of residues in protein sequence
Percent Coverage percent of protein sequence observed from the identified peptides
Organism species of identified protein
Protein Description protein name
Protein Existence type of evidence that supports the existence of the protein
Protein Probability confidence score determined by ProteinProphet
Top Peptide Probability best peptide probability of supporting peptides
Total Peptides number of peptides (stripped sequences) that can be mapped to the protein. There could be
- peptides only mapped to this protein
- peptides can be mapped to multiple proteins but the protein inference algorithm assigns it to this protein
- peptides can be mapped to multiple proteins but the protein inference algorithm assigns it to the other protein
Unique Peptides number of peptides (stripped sequences) that only mapped to the protein
Razor Peptides number of peptides (stripped sequences) in support of the protein identification. There could be
- peptides only mapped to this protein
- peptides can be mapped to multiple proteins but the protein inference algorithm assigns it to this protein
Total Spectral Count number of PSMs corresponding to the total peptides
Unique Spectral Count number of PSMs corresponding to the unique peptides
Razor Spectral Count number of PSMs corresponding to the razor peptides
Total Intensity protein intensity calculated using the total peptides (from the top-N algorithm)
Unique Intensity protein intensity calculated using the unique peptides (from the top-N algorithm)
Razor Intensity protein intensity calculated using the unique peptides (from the top-N algorithm)
Razor Assigned Modifications modifications from the razor peptides
Razor Observed Modifications Delta Mass values from the razor peptides
Indistinguishable Proteins proteins that are equally supported by the evidence and cannot be distinguished from the identification in the ‘Protein’ column
(additional columns for TMT/iTRAQ channels if applicable, each contains relative reporter ion abundances)
combined_ion.tsv
combined_ion.tsv
files contain FDR-filtered ions from all experimental groups, where each row contains a peptide sequence with a certain charge and modification state. Individual PSMs are collapsed. Contents of each column are listed below.
Peptide Sequence peptide amino acid sequence, any modifications not included (‘stripped’ peptide sequence)
Modified Sequence peptide amino acid sequence including modifications
Prev AA residue preceding the identified peptide within the mapped protein sequence; -
if none
Next AA residue following the identified peptide within the mapped protein sequence; -
if none
Start starting position of the peptide within the mapped protein sequence
End ending position of the peptide within the mapped protein sequence
Peptide Length number of residues in the peptide sequence
M/Z theoretical peptide ion mass-to-charge ratio based on identified sequence, charge, and modifications
Charge charge state of the identified peptide ion
Assigned Modifications variable modifications of the peptide found in the experiment, each listed by mass in Da with modified residue and location within the peptide sequence
Protein protein sequence header corresponding to the identified peptide sequence; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID protein identifier (primary accession number) for the selected protein
Entry Name protein entry name
Gene gene name of the protein corresponding to the identified peptide sequence; this will be from the selected razor protein if the peptide maps to multiple proteins
Protein Description name of the selected protein
[experiment] Spectral Count count of peptide-spectrum matches (PSMs) in the sample that support the peptide identification
[experiment] Intensity normalized ion intensities
combined_modified_peptide.tsv
combined_modified_peptide.tsv
files contain FDR-filtered peptides from all experimental groups, where each row is a peptide sequence including modifications. Individual ions are collapsed. Contents of each column are listed below.
Peptide Sequence peptide amino acid sequence, modifications not included (‘stripped’ peptide sequence)
Modified Sequence peptide amino acid sequence plus modifications
Prev AA amino acid residue preceding the peptide sequence
Next AA amino acid residue following the peptide sequence
Start starting position of the peptide within the mapped protein sequence
End ending position of the peptide within the mapped protein sequence
Peptide Length number of amino acid residues in the identified peptide
Charges all observed charge states for the modified peptide in the experiment
Assigned Modifications all variable modifications of the peptide found in the experiment, each listed by mass in Da with modified residue and location within the peptide sequence
Protein protein sequence header corresponding to the identified peptide sequence; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID protein identifier (primary accession number) for the selected protein
Entry Name protein entry name corresponding to the identified peptide sequence
Gene gene name of the protein corresponding to the identified peptide sequence; this will be from the selected razor protein if the peptide maps to multiple proteins
Protein Description name of the selected protein
Mapped Genes additional genes the identified peptide may originate from (including any arising from I/L substitutions)
Mapped Proteins additional proteins the identified peptide maps to (including any arising from I/L substitutions)
[experiment] Spectral Count count of peptide-spectrum matches (PSMs) in the sample that support the peptide identification
[experiment] Intensity normalized peptide intensities
[experiment] MaxLFQ Intensity normalized peptide intensities calculated with the MaxLFQ method (this column is only present if ‘MaxLFQ’ is selected)
combined_peptide.tsv
combined_peptide.tsv
files contain FDR-filtered peptides from all experimental groups, where each row is a (stripped) peptide sequence. Modified versions of peptides are collapsed. Contents of each column are listed below.
Peptide Sequence peptide amino acid sequence, any modifications not included (‘stripped’ peptide sequence)
Prev AA amino acid residue preceding the peptide sequence
Next AA amino acid residue following the peptide sequence
Start starting position of the peptide within the mapped protein sequence
End ending position of the peptide within the mapped protein sequence
Peptide Length number of amino acid residues in the identified peptide
Charges all observed charge states for the peptide in the experiment
Protein protein sequence header corresponding to the identified peptide sequence; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID protein identifier (primary accession number) for the selected protein
Entry Name protein entry name corresponding to the identified peptide sequence
Gene gene name of the protein corresponding to the identified peptide sequence; this will be from the selected razor protein if the peptide maps to multiple proteins
Protein Description name of the selected protein
Mapped Genes additional genes the identified peptide may originate from (including any arising from I/L substitutions)
Mapped Proteins additional proteins the identified peptide maps to (including any arising from I/L substitutions)
[experiment] Spectral Count count of peptide-spectrum matches (PSMs) in the sample that support the peptide identification
[experiment] Intensity normalized peptide sequence intensities
[experiment] MaxLFQ Intensity normalized peptide seqeunce intensities calculated with the MaxLFQ method (this column is only present if ‘MaxLFQ’ is selected)
combined_protein.tsv
combined_protein.tsv
files contain FDR-filtered proteins from all experimental groups, where each row is a protein group. Contents of each column are listed below.
Protein protein sequence header corresponding to the identified peptide sequence inferred from combined evidence; this will be the selected razor protein if the peptide maps to multiple proteins
Protein ID protein identifier (primary accession number) for the selected protein
Entry Name entry name for the selected protein
Gene gene name for the selected protein
Protein Length number of amino acid sequences in the selected protein
Coverage percent of total protein length represented by the identified peptides
Organism species corresponding to the protein identification
Protein Existence type of evidence for the existence of the protein
Description name of the selected protein
Protein Probability confidence score determined by ProteinProphet from combined evidence, higher values indicate greater confidence
Top Peptide Probability highest PeptideProphet confidence score from all peptides that map to the protein
Combined Total Peptides number of peptides (stripped sequences) that can be mapped to the protein. There could be
- peptides only mapped to this protein
- peptides can be mapped to multiple proteins but the protein inference algorithm assigns it to this protein
- peptides can be mapped to multiple proteins but the protein inference algorithm assigns it to the other protein
Combined Spectral Count number of PSMs corresponding to the razor peptides. Check the description about the razor peptides.
Combined Unique Spectral Count number of PSMs corresponding to the unique peptides. Check the description about the unique peptides for details.
Combined Total Spectral Count number of PSMs corresponding to the total peptides. Check the description about the unique peptides for details.
[experiment] Spectral Count number of PSMs in the sample corresponding to the razor peptides
[experiment] Unique Spectral Count number of PSMs in the sample corresponding to the unique peptides
[experiment] Total Spectral Count number of PSMs in the sample corresponding to the total peptides
[experiment] Intensity normalized (by default) protein intensity using the razor peptides (from the top-N algorithm)
[experiment] Unique Intensity normalized (by default) protein intensity using the unique peptides (from the top-N algorithm)
[experiment] Total Intensity normalized (by default) protein intensity using the total peptides (from the top-N algorithm)
[experiment] MaxLFQ Intensity normalized (by default) protein intensity using the razor peptides (from the MaxLFQ method)
[experiment] MaxLFQ Unique Intensity normalized (by default) protein intensity using the unique peptides (from the MaxLFQ method)
[experiment] MaxLFQ Total Intensity normalized (by default) protein intensity using the total peptides (from the MaxLFQ method)
Indistinguishable Proteins proteins that cannot be distinguished from the selected protein given all sequences/evidence identified in the experiment
global.profile.tsv
global.profile.tsv
reports the most prominent features from PTM-Shepherd analysis of mass shifts observed from FDR-filtered open search results. Each row corresponds to a different detected mass shift, thus not all PSMs will be represented in this table. Please note that mass shifts are annotated based on UniMod mapping, thus they are not definitive chemical identities and should be used as a starting point along with localization and amino acid enrichment information. Unless otherwise indicated, values are summed from all datasets in the analysis. Column contents are listed below.
peak_apex apex of the detected delta mass peak (in Da)
peak_lower lower bound of the detected peak (Da), determined by precursor tolerance or the detection of an adjacent peak
peak_upper upper bound of the detected peak (Da), determined by precursor tolerance or the detection of an adjacent peak
PSMs the number of PSMs contained within the peak boundary (bin), reported for each dataset if multiple datasets are used as input
peak_signal relative measure of peak prominence/quality. In noisy regions of the delta mass histogram, values are penalized
percent_also_in_unmodified the percentage of PSMs in this mass bin with a corresponding PSM in the unmodified bin
mapped_mass_1 primary modification annotation derived from Unimod, all isobaric modifications listed and separated by “/”
mapped_mass_2 if the delta mass peak is a combination of two masses, a second modification annotation is listed here. As with mapped_mass_1, all isobaric modifications are listed and separated by “/”
similarity MS/MS spectral similarity of modified peptides compared to their unmodified counterparts. When multiple modified-unmodified comparisons are done for a single peptide, these cosine similarity scores are averaged for the peptide. The peptide scores are then averaged across all peptides in the mass shift bin. These comparisons are only done for peptides of the same charge state.
rt_shift retention time shift comparing modified peptides to their unmodified counterparts. When multiple modified-unmodified comparisons are done for a single peptide, the retention time shifts are averaged for the peptide. The peptide shifts are then averaged across all peptides in the mass shift bin. Individual comparisons are only done for peptides in the same LC-MS run. Units are usually seconds but can vary by instrument type
int_log2fc log2 fold-change of average intensity for matched shifted/unshifted peptides, computed as described above. Peptides affect by sample preparation artifacts tend to be lower abundance than their unshifted counterparts, thus this value will be low in these cases
localized_PSMs number of PSMs for this delta mass that showed at least one additional matched ion when the mass shift is placed on a residue
n-term_localization_rate percentage of PSMs with an uninterrupted string of localized residues from the N-terminus. This is calculated differently from other enrichment scores due to the difference in assumptions underlying N-terminal and residue-specific localization, so these values cannot be directly compared to the amino acid enrichment scores.
AA1 amino acid/residue most enriched (most likely to harbor the mass shift) compared to other residues
AA1_enrichment_score equivalent to the odds the delta mass is localized to AA1 compared to other residues
AA1_psm_count weighted number of PSMs where the mass shift localized to AA1. Shifts localizing to multiple residues are divided by the number of localized residues in the spectra, so this is an estimated number of PSMs localized to a particular residue
(same enrichment_score, and psm_count columns for AA2 and AA3 if multiple amino acids are likely to harbor the mass shift)
[experiment]_PSMs number of PSMs with a mass shift in this bin
[experiment]_percent_PSMs number of PSMs from the previous column as a percentage of total PSMs
[experiment]_peptides number of unique peptide sequences with a mass shift in this bin
[experiment]_percent_also_in_unmodified percentage of peptide sequences with a mass shift in this bin that are also found in the zero mass shift bin
global.diagmine.tsv
global.diagmine.tsv
is a mass shift-centric table that contains the diagnostic features identified for every mass shift. Please note that only mass shifts with diagnostic features detected are reported in the table. Contents of each column are listed below.
peak_apex This field contains the apex of the detected MS1 peak (Da) present in the global.profile.tsv file from PTM-Shepherd.
mod_annotation This field contains the mass shift annotations present in the global.profile.tsv file from PTM-Shepherd. When a mass shift is found to be the combination of two mass shifts, the “Potential Modification 1” and “Potential Modification 2” columns are merged with a semicolon.
type This field can take one of several values. “diagnostic” refers to diagnostic ions, the ions that can be located directly in the spectrum. “peptide” refers to peptide remainder masses, mass shifts that indicate an ion’s presence at a particular distance from an unshifted peptide. Six other values are possible based on parameter setting, each corresponding to one of the major ion series.
mass This field contains the mass of the diagnostic feature. Peptide and fragment remainder masses will have the mass shift away from the theoretical ion. Diagnostic ions will have the m/z of the observed ion, so a non-neutral mass.
delta_mod_mass This field contains the mass that was lost from the original mass shift to arrive at the remainder mass. (Note: only present for peptide and fragment remainder masses.)
remainder_propensity This field contains the average percentage of ions from a particular series that are shifted. For example, a peptide capable of producing 10 b-ions with 2 ions identified ions shifted by the remainder mass and 2 identified ions unshifted would have a propensity of 50%. The propensity score for every representative PSM within a mass shift bin is averaged. (Note: only present for fragment remainder masses.)
percent_mod This field contains the percentage of representative mass shifted PSMs that contain the ion at any intensity.
percent_unmod This field contains the percentage of representative unshifted PSMs that contain the ion at any intensity.
avg_intensity_mod This field contains the average intensity of the ion among representative mass shifted PSMs where the ion is present. To calculate the average across all representative mass shifted spectra, calculate (avg_intensity_mod * percent_mod / 100). Because multiple ions can be matched for fragment remainder ions, this contains the average of the summed intensity of matched ions for each representative PSM.
avg_intensity_unmod This field contains the average intensity of the ions among representative unshifted PSMs where the ion is present. To calculate the average across all representative mass shifted spectra, calculate (avg_intensity_mod * percent_mod / 100). Because multiple ions can be matched for fragment remainder ions, this contains the average of the summed intensity of matched ions for each representative PSM.
intensity_fold_change This field contains the fold change in intensity when comparing the modified to unmodified peptides. This uses intensity across all spectra and can be calculated via (avg_intensity_mod * percent_mod) / (avg_intensity_unmod * percent_unmod).
avg_charge This field contains the average charge of peptides from the mass shift. This enables researchers to to use diagnostic ion information intelligently in designing targeted MS routines or rescoring.
auc This column contains the AUC-ROC statistic for the intensity-based classification of this ion. It is calculated from the U statistic from the Mann-Whitney U Test. This statistic adjusts the two groups such that they are assumed to be of equal size.
global.modsummary.tsv
global.modsummary.tsv
is a modification-centric table generated from PTM-Shepherd summarization of mass shifts observed in open search workflows. Please note that mass shifts are annotated based on UniMod mapping, thus they are not definitive chemical identities and should be used as a starting point along with localization and amino acid enrichment information. Contents of each column are listed below.
Modification Name/annotation of the modification (as found in the global.profile.tsv file)
Theoretical Mass Shift The theoretical mass (in Da) of the modification from Unimod if annotated, or the peak apex of an unannotated modification
[experiment]_PSMs Number of PSMs with the modification, including any row from the global.profile.tsv file where the modification appears (e.g., a ‘Methylation’ entry in the will include PSMs corresponding to both ‘Methylation’ and ‘Methylation + First isotopic peak’)
[experiment]_percent_PSMs The number of PSMs from the previous column as a percentage of the total PSMs
gene
[abundance/ratio]_gene_[normalization].tsv
contains isobaric quantification information summarized from the psm.tsv tables by TMT-Integrator to the gene level. If ‘Group by’ is set to ‘Gene level’ (default for non-modification centric quantification workflows) in the ‘Quant (Isobaric)’ tab of FragPipe, only gene-level reports will be generated. Set ‘Group by’ to ‘All’ to also generate protein and peptide-level reports. (Ratios are channel abundance / reference channel abundance, so [channel] - ReferenceIntensity in the tables since values are log2-transformed.)
Index gene name (works best if the analyses were run with properly-formatted FASTA sequence databases, see this page for more information)
NumberPSM total peptide-spectrum matches mapping to the gene that are used in quantification
ProteinID protein identifier mapping to the gene
MaxPepProb highest PeptideProphet probability of the PSMs mapping to the gene
ReferenceIntensity Real reference channel abundance is used if one has been provided, otherwise these values are virtual reference abundances from the average abundance across the channels in a plex, more usage information here. If the experiment contains multiple plexes, average reference intensity across all plexes is used. Values are log2 scaled, with global minimum reference intensity used to impute missing values.
[sample/channel name] normalized and log2 transformed abundance/ratio for the given reporter ion channel from summarization to the gene level
protein
[abundance/ratio]_protein_[normalization].tsv
contains isobaric quantification information summarized from the psm.tsv tables by TMT-Integrator to the protein level. If ‘Group by’ is set to ‘Protein’ in the ‘Quant (Isobaric)’ tab of FragPipe, only protein-level reports will be generated. Set ‘Group by’ to ‘All’ to also generate gene and peptide-level reports. (Ratios are channel abundance / reference channel abundance, so [channel] - ReferenceIntensity in the tables since values are log2-transformed.)
Index protein name (FASTA sequence header)
NumberPSM total peptide-spectrum matches mapping to the gene that are used in quantification
Gene originating gene name for the protein
MaxPepProb highest PeptideProphet probability of the PSMs mapping to the protein that are used in quantification
ReferenceIntensity Real reference channel abundance is used if one has been provided, otherwise these values are virtual reference abundances from the average abundance across the channels in a plex, more usage information here. If the experiment contains multiple plexes, average reference intensity across all plexes is used. Values are log2 scaled, with global minimum reference intensity used to impute missing values.
[sample/channel name] normalized and log2 transformed abundance/ratio for the given reporter ion channel from summarization to the protein level
peptide
[abundance/ratio]_peptide_[normalization].tsv
contains isobaric quantification information summarized from the psm.tsv tables by TMT-Integrator to the peptide level. If ‘Group by’ is set to ‘Peptide sequence’ in the ‘Quant (Isobaric)’ tab of FragPipe, only peptide-level reports will be generated. Set ‘Group by’ to ‘All’ to also generate gene and protein-level reports. (Ratios are channel abundance / reference channel abundance, so [channel] - ReferenceIntensity in the tables since values are log2-transformed.)
Index protein name (FASTA sequence header) with the start and end positions of the peptide within the protein sequence
Gene originating gene name for the peptide
ProteinID protein identifier
Peptide stripped peptide sequence
MaxPepProb highest PeptideProphet probability of the PSMs with the sequence that are used in quantification
ReferenceIntensity Real reference channel abundance is used if one has been provided, otherwise these values are virtual reference abundances from the average abundance across the channels in a plex, more usage information here. If the experiment contains multiple plexes, average reference intensity across all plexes is used. Values are log2 scaled, with global minimum reference intensity used to impute missing values.
[sample/channel name] normalized and log2 transformed abundance/ratio for the given reporter ion channel from summarization to the stripped peptide sequence level
multi-site
[abundance/ratio]_multi-site_[normalization].tsv
contains isobaric quantification information summarized from the psm.tsv tables by TMT-Integrator based on modification sites that have been observed and quantified together. If ‘Group by’ is set to ‘Multiple PTM sites’ in the ‘Quant (Isobaric)’ tab of FragPipe, only multi-site reports will be generated. Set ‘Group by’ to ‘All’ to generate reports at all levels. (Ratios are channel abundance / reference channel abundance, so [channel] - ReferenceIntensity in the tables since values are log2-transformed.)
Index protein identifier with the start and end positions of the potential modification sites within the protein sequence, the count of the possible sites in that sequence window, the count of localized modifications, and the list of modified sites
Gene originating gene name
ProteinID protein identifier
Peptide stripped peptide sequence containing the modification sites, residues with localized modifications are shown in lower case
MaxPepProb highest PeptideProphet probability of the PSMs with the sequence that are used in quantification
ReferenceIntensity Real reference channel abundance is used if one has been provided, otherwise these values are virtual reference abundances from the average abundance across the channels in a plex, more usage information here. If the experiment contains multiple plexes, average reference intensity across all plexes is used. Values are log2 scaled, with global minimum reference intensity used to impute missing values.
[sample/channel name] normalized and log2 transformed abundance/ratio for the given reporter ion channel from summarization to the multiple-mod site level
single-site
[abundance/ratio]_single-site_[normalization].tsv
contains isobaric quantification information summarized from the psm.tsv tables by TMT-Integrator to the level of single post-translationally modified sites. If ‘Group by’ is set to ‘Single PTM site’ in the ‘Quant (Isobaric)’ tab of FragPipe, only multi-site reports will be generated. Set ‘Group by’ to ‘All’ to generate reports at all levels. (Ratios are channel abundance / reference channel abundance, so [channel] - ReferenceIntensity in the tables since values are log2-transformed.)
Index protein name (FASTA sequence header) with the modified site location within the protein sequence
Gene originating gene name
Peptide stripped peptide sequence containing the modification sites, residues with localized modifications are shown in lower case
SequenceWindow peptide sequence around the localized modified site.
MaxPepProb highest PeptideProphet probability of the PSMs with the sequence that are used in quantification
ReferenceIntensity Real reference channel abundance is used if one has been provided, otherwise these values are virtual reference abundances from the average abundance across the channels in a plex, more usage information here. If the experiment contains multiple plexes, average reference intensity across all plexes is used. Values are log2 scaled, with global minimum reference intensity used to impute missing values.
[sample/channel name] normalized and log2 transformed abundance/ratio for the given reporter ion channel from summarization to the single modification site level (sites are quantified from PSMs with only the site of interest if available, otherwise the median of all localized sites is used if the site is only found with additional sites)
ion_label_quant.tsv
ion_label_quant.tsv
contains MS1-based isotopic quantification results from IonQuant at the ion (peptide + modification state + charge) level. See the SILAC tutorial for more information. If only Light and Heavy labels are used, columns with ‘Medium’/’M’ will be missing.
Peptide Sequence stripped peptide sequence of the ion
Modified Peptide peptide sequence plus variable modifications denoted in brackets following modified residues
Peptide Length number of amino acid residues in the peptide ion
Charge precursor charge state of the ion
Label Count number of potentially labeled sites within the peptide
[Light/Medium/Heavy] Modified Peptide peptide sequence showing variable modifications plus the positions of labels; blank if not found with the corresponding label
[Light/Medium/Heavy] Intensity maximum observed abundance of the precursor ion with the corresponding labels
Log2 Ratio ML median-centered log2 ratio of Medium to Light intensities, from the maximum observed precursor abundance for each labeled state
Log2 Ratio HL median-centered log2 ratio of Heavy to Light intensities, from the maximum observed precursor abundance for each labeled state
Log2 Ratio HM median-centered log2 ratio of Heavy to Medium intensities, from the maximum observed precursor abundance for each labeled state
Pearson Correlation LM measure of similarity in chromatographic profiles between Light- and Medium-labeled ions
Pearson Correlation LH measure of similarity in chromatographic profiles between Light- and Heavy-labeled ions
Pearson Correlation MH measure of similarity in chromatographic profiles between Medium- and Heavy-labeled ions
[Light/Medium/Heavy] Traced Scans number of MS1 scans quantified for the corresponding label
[Light/Medium/Heavy] Isotopes number of isotopic peaks found for the precursor ion
[Light/Medium/Heavy] Apex Retention Time retention time of the precursor’s apex intensity (usually in seconds but units may vary by instrument type)
[Light/Medium/Heavy] Log10 KL log10 Kullback-Leibler divergence between the observed and theoretical isotope intensity distributions for the ion
[Light/Medium/Heavy] PeptideProphet Probability maximum PeptideProphet probability from all supporting PSMs for the corresponding label
Protein protein sequence header corresponding to the identified peptide ion; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID protein identifier (primary accession number) for the mapped protein
Entry Name entry name for the mapped protein
Gene gene name for the mapped protein
Protein Description name of the mapped protein
Mapped Genes additional genes the identified peptide may originate from (including any arising from I/L substitutions)
Mapped Proteins additional proteins the identified peptide maps to (including any arising from I/L substitutions)
peptide_label_quant.tsv
peptide_label_quant.tsv
contains MS1-based isotopic quantification results from IonQuant summarized to the peptide (stripped sequence) level. See the SILAC tutorial for more information. If only Light and Heavy labels are used, columns with ‘Medium’/’M’ will be missing.
Peptide Sequence stripped peptide sequence
Modified Peptide peptide sequence plus variable modifications denoted in brackets following modified residues
Peptide Length number of amino acid residues in the sequence
Charges observed precursor charge states for the peptide
Label Count number of potentially labeled sites within the peptide
[Light/Medium/Heavy] Modified Peptide peptide sequence showing variable modifications plus the positions of labels; blank if not found with the corresponding label
Log2 Ratio ML median-centered log2 ratio of Medium to Light intensities, from the maximum observed precursor abundance for each labeled state
Log2 Ratio HL median-centered log2 ratio of Heavy to Light intensities, from the maximum observed precursor abundance for each labeled state
Log2 Ratio HM median-centered log2 ratio of Heavy to Medium intensities, from the maximum observed precursor abundance for each labeled state
Best Pearson Correlation LM highest observed similarity in chromatographic profiles between Light- and Medium-labeled peptides
Best Pearson Correlation LH highest observed similarity in chromatographic profiles between Light- and Heavy-labeled peptides
Best Pearson Correlation MH highest observed similarity in chromatographic profiles between Medium- and Heavy-labeled peptides
Best [Light/Medium/Heavy] PeptideProphet Probability maximum PeptideProphet probability from all supporting PSMs for the corresponding label
Protein protein sequence header corresponding to the identified peptide; this will be the selected razor protein if the peptide maps to multiple proteins (in this case, other mapped proteins are listed in the ‘Mapped Proteins’ column)
Protein ID protein identifier (primary accession number) for the mapped protein
Entry Name entry name for the mapped protein
Gene gene name for the mapped protein
Protein Description name of the mapped protein
Mapped Genes additional genes the identified peptide may originate from (including any arising from I/L substitutions)
Mapped Proteins additional proteins the identified peptide maps to (including any arising from I/L substitutions)
protein_label_quant.tsv
protein_label_quant.tsv
contains MS1-based isotopic quantification results from IonQuant summarized to the protein level. See the SILAC tutorial for more information. If only Light and Heavy labels are used, columns with ‘Medium’/’M’ will be missing.
Protein protein sequence header
Protein ID protein identifier (primary accession number)
Entry Name entry name for the protein
Gene gene name for the protein
Protein Description name of the protein
Mapped Genes additional genes the supporting peptides map to (including any arising from I/L substitutions)
Mapped Proteins additional proteins the supporting peptides map to (including any arising from I/L substitutions)
Ratios ML number of Medium / Light abundance ratios
Ratios HL number of Heavy / Light abundance ratios
Ratios HM number of Heavy / Medium abundance ratios
Median Log2 Ratios ML median of ion-level log2 Medium / Light abundance ratios
Median Log2 Ratios HL median of ion-level log2 Heavy / Light abundance ratios
Median Log2 Ratios HM median of ion-level log2 Heavy / Medium abundance ratios