Survival analysis in breast cancer using proteomic data from four independent datasets

Ágnes Ősz, András Lánczky & Balázs Győrffy

Scientific Reports volume 11, Article number: 16787 (2021)

https://doi.org/10.1038/s41598-021-96340-5

Abstract

Breast cancer clinical treatment selection is based on the immunohistochemical determination of four protein biomarkers: ESR1, PGR, HER2, and MKI67. Our aim was to correlate immunohistochemical results to proteome-level technologies in measuring the expression of these markers. We also aimed to integrate available proteome-level breast cancer datasets to identify and validate new prognostic biomarker candidates. We searched studies involving breast cancer patient cohorts with published survival and proteomic information. Immunohistochemistry and proteomic technologies were compared using the Mann–Whitney test. Receiver operating characteristics (ROC) curves were generated to validate discriminative power. Cox regression and Kaplan–Meier survival analysis were calculated to assess prognostic power. False Discovery Rate was computed to correct for multiple hypothesis testing. We established a database integrating protein expression data and survival information from four independent cohorts for 1229 breast cancer patients. In all four studies combined, a total of 7342 unique proteins were identified, and 1417 of these were identified in at least three datasets. ESR1, PGR, and HER2 protein expression levels determined by RPPA or LC–MS/MS methods showed a significant correlation with the levels determined by immunohistochemistry (p < 0.0001). PGR and ESR1 levels showed a moderate correlation (correlation coefficient = 0.17, p = 0.0399). An additional panel of candidate proteins, including apoptosis-related proteins (BCL2,), adhesion markers (CDH1, CLDN3, CLDN7) and basal markers (cytokeratins), were validated as prognostic biomarkers. Finally, we expanded our previously established web tool designed to validate survival-associated biomarkers by including the proteomic datasets analyzed in this study (https://kmplot.com/). In summary, large proteomic studies now provide sufficient data enabling the validation and ranking of potential protein biomarkers.