Section of Biosimulation and Bioinformatics, Center for Medical Statistics, Informatics and Intelligent Systems (CeMSIIS), Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria.
Translational Gynecology Group, Department of Obstetrics and Gynecology, Comprehensive Cancer Center, Medical University of Vienna, Waehringer Guertel 18-20, 1090 Vienna, Austria.
Biomed Res Int. 2020 Aug 6;2020:1363827. doi: 10.1155/2020/1363827. eCollection 2020.
Precision medicine for breast cancer relies on biomarkers to select therapies. However, the reliability of biomarkers drawn from gene expression arrays has been questioned and calls for reassessment, in particular for large datasets. We revisit widely used data-normalization procedures and evaluate differences in outcome in order to pinpoint the most reliable reprocessing methods biomarkers can be based upon. We generated a database of 3753 breast cancer patients out of 38 studies by downloading and curating patient samples from NCBI-GEO. As gene-expression biomarkers, we select the assessment of receptor status and breast cancer subtype classification. Each normalization procedure is applied separately, and biomarkers are then evaluated for each patient. Differences between normalization pipelines are quantified as percentages of patients having outcomes different for each pipeline. Some normalization procedures lead to quite consistent biomarkers, differing only in 1-2% of patients. Other normalization procedures-some of them have been used in many clinical studies-end up with distrusting discrepancies (10% and more). A good deal of doubt regarding the reliability of microarrays may root in the haphazard application of inadequate preprocessing pipelines. Several modes of batch corrections are evaluated regarding a possible improvement of receptor prediction from gene expression versus the golden standard of immunohistochemistry. Finally, we nominate those normalization methods yielding consistent and trustable results. Adequate bioinformatics data preprocessing is key and crucial for any subsequent statistics to arrive at trustable results. We conclude with a suggestion for future bioinformatics development to further increase the reliability of cancer biomarkers.
精准医学依赖于生物标志物来选择疗法。然而,基因表达谱分析中生物标志物的可靠性受到了质疑,需要重新评估,尤其是对于大型数据集。我们重新审视了广泛使用的数据标准化程序,并评估了结果的差异,以确定最可靠的生物标志物重新处理方法。我们从 NCBI-GEO 下载并整理了患者样本,从 38 项研究中生成了一个包含 3753 名乳腺癌患者的数据库。我们选择评估受体状态和乳腺癌亚型分类作为基因表达生物标志物。每种归一化程序都单独应用,然后评估每个患者的生物标志物。归一化管道之间的差异被量化为每个管道对不同患者的结果的百分比。一些归一化程序导致相当一致的生物标志物,仅在 1-2%的患者中存在差异。其他归一化程序(其中一些已在许多临床研究中使用)最终导致不可信的差异(10%及以上)。微阵列的可靠性存在很大的疑问,可能源于不适当的预处理管道的随意应用。针对可能通过基因表达相对于免疫组织化学的黄金标准来改善受体预测,我们评估了几种批量校正模式。最后,我们提名那些产生一致和可信结果的归一化方法。适当的生物信息学数据预处理是关键,对于任何后续的统计分析来说都是至关重要的,以获得可信的结果。我们最后提出了未来生物信息学发展的建议,以进一步提高癌症生物标志物的可靠性。
Biomed Res Int. 2020
Breast Cancer Res. 2007
Annu Int Conf IEEE Eng Med Biol Soc. 2007
J Natl Cancer Inst. 2007-11-21
BMC Bioinformatics. 2024-1-22
Sci Rep. 2021-2-19
Breast Cancer Res Treat. 2018-8-16
Front Genet. 2018-3-16
Int J Technol Assess Health Care. 2017-5-10
Microarrays (Basel). 2015-8-21