Suppr超能文献

新型生物标志物的发现改善了乳腺癌内在亚型预测并协调了METABRIC数据集中的标签。

The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set.

作者信息

Milioli Heloisa Helena, Vimieiro Renato, Riveros Carlos, Tishchenko Inna, Berretta Regina, Moscato Pablo

机构信息

Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia; School of Environmental and Life Science, The University of Newcastle, Callaghan, NSW, Australia.

Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, Hunter Medical Research Institute, New Lambton Heights, NSW, Australia; Centro de Informática, Universidade Federal de Pernambuco, Recife, PE, Brazil.

出版信息

PLoS One. 2015 Jul 1;10(7):e0129711. doi: 10.1371/journal.pone.0129711. eCollection 2015.

Abstract

BACKGROUND

The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction.

METHODS AND FINDINGS

The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method.

CONCLUSIONS

The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes.

摘要

背景

乳腺癌内在亚型的预测已被视为确定患者诊断、预后及治疗反应的一项重要策略。基于50个基因表达水平的PAM50方法,使用单样本预测模型为样本分配亚型标签。该检测方法中报告的内在误差表明了识别和理解乳腺癌分组的挑战。在本研究中,我们旨在:a)通过探索一种新提出的名为CM1评分的方法的能力,识别用于亚型区分的新型生物标志物;b)应用集成学习,而非使用单个分类器,来进行样本亚型分配。总体目标是改善分类预测。

方法与结果

本研究中使用的微阵列转录组数据集为:记录了2000多名患者的METABRIC乳腺癌数据,以及来自ROCK数据库的包含1570个样本的公共综合数据源。我们首先计算CM1评分,以识别在各内在亚型样本中具有高度判别性表达模式的探针。我们进一步使用来自Weka软件套件的24种不同分类器,评估42个选定探针分配正确亚型标签的能力。为作比较,对PAM50方法中的50个基因列表应用相同方法。

结论

CM1评分描绘了30个用于预测乳腺癌亚型的新型生物标志物,同时证实了12个已确立基因的作用。使用CM1列表和分类器集合分配的内在亚型比原始的PAM50标签更一致、更均匀。新的亚型在METABRIC和ROCK数据集中显示出当前临床标志物ER、PR和HER2的准确分布以及生存曲线。值得注意的是,原始标签的矛盾归因强化了使用单样本分类器预测乳腺癌内在亚型的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fee9/4488510/b2c02eed263b/pone.0129711.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验