Using pmultiqc with ProteoBench¶
The proteobench plugin integrates with ProteoBench, a community benchmarking platform for proteomics data analysis workflows. It visualizes the standardized result file produced by ProteoBench to compare quantification performance across conditions.
Supported Input Files¶
| File | Description |
|---|---|
result_performance.csv (or .tsv, .txt) |
ProteoBench result file with precursor-level quantification |
The file must contain columns produced by ProteoBench's result_performance module. Key expected columns include precursor ion identifiers, charge states, condition-grouped intensities (columns containing abundance_ or _Condition_), and computed fold-change statistics.
Running the Report¶
# Basic usage — point to the directory containing the ProteoBench result file
multiqc --proteobench-plugin /path/to/proteobench/results -o ./report
# Only one result file is allowed per run; multiple files will raise an error
multiqc --proteobench-plugin /path/to/proteobench/results -o ./report --disable-hoverinfo
What ProteoBench Measures¶
ProteoBench standardizes benchmarking by quantifying a two-condition mixture (Condition A and Condition B) with known fold changes. The result file captures precursor-level quantification across replicates of both conditions.
Precursor Ion Charge Distribution¶
A bar chart showing the distribution of precursor charge states across all detected precursors. This provides a first-pass view of sample complexity and instrument performance.
Log2 Mean Intensity Distribution¶
Line graph and bar chart showing: - Distribution of mean log2-transformed precursor intensities for Condition A and Condition B - Number of missing (NA) intensity values per condition
Missing values indicate precursors detected in one condition but not the other, reflecting sensitivity and reproducibility of the quantification workflow.
Per-Run Intensity Distribution¶
Log2 intensity distributions per individual raw file (columns matching abundance_ or _Condition_). This reveals run-to-run variability and potential batch effects across replicates.
Intensity Count per File¶
Bar chart of the number of quantified precursors in each raw file. Significant variation across replicates suggests data quality issues in specific runs.
Log2 Intensity Standard Deviation¶
Distribution of intra-condition standard deviations (log scale). Lower standard deviation indicates higher quantification reproducibility within each condition.
Coefficient of Variation (CV)¶
Distribution of CVs for precursor intensities across replicates. Median CV is a standard metric for comparing quantification robustness across methods and pipelines.
Log2 Fold Change (A vs. B)¶
Distribution of observed log2 fold changes between Condition A and Condition B. For standard ProteoBench benchmarks (e.g., LFQ benchmark dataset), the expected fold changes are known; this plot allows direct comparison of observed vs. expected ratios.
Epsilon (Deviation from Expected Fold Change)¶
Difference between observed and expected log2 fold change per precursor. Epsilon close to 0 indicates accurate quantification; systematic offsets suggest normalization or calibration issues.
Log2FC vs. Log2 Mean Intensity (MA-Style Plot)¶
Scatter plot of log2 fold change against mean log2 intensity across conditions. Intensity-dependent fold change bias (common in low-abundance precursors) is visible as a curved or tilted trend in this plot.
Interpreting Results¶
| Metric | Better Performance |
|---|---|
| Missing value count | Lower |
| CV (intra-condition) | Lower |
| Epsilon | Closer to 0 |
| Log2FC deviation from expected | Smaller spread |
| Precursors per file | Higher and consistent |
Notes¶
- The ProteoBench plugin expects exactly one result file in the target directory. If multiple files are detected, pmultiqc raises a
ValueErrorand exits. - The module is implemented in
pmultiqc/modules/proteobench/proteobench.pywith utility functions inproteobench_utils.py. - ProteoBench result files from any workflow (MaxQuant, DIA-NN, FragPipe, etc.) are supported as long as the column naming convention is consistent with ProteoBench output format.