Milioli Heloisa H, Vimieiro Renato, Tishchenko Inna, Riveros Carlos, Berretta Regina, Moscato Pablo
Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, Hunter Medical Research Institute, Lot 1, Kookaburra Circuit, New Lambton Heights, 2305 Australia ; School of Environmental and Life Science, The University of Newcastle, University Drive, Callaghan, 2308 Australia.
Centro de Informática, Universidade Federal de Pernambuco, Av. Prof. Moraes Rego, Recife, Brazil.
BioData Min. 2016 Jan 13;9:2. doi: 10.1186/s13040-015-0078-9. eCollection 2016.
Multi-gene lists and single sample predictor models have been currently used to reduce the multidimensional complexity of breast cancers, and to identify intrinsic subtypes. The perceived inability of some models to deal with the challenges of processing high-dimensional data, however, limits the accurate characterisation of these subtypes. Towards the development of robust strategies, we designed an iterative approach to consistently discriminate intrinsic subtypes and improve class prediction in the METABRIC dataset.
In this study, we employed the CM1 score to identify the most discriminative probes for each group, and an ensemble learning technique to assess the ability of these probes on assigning subtype labels using 24 different classifiers. Our analysis is comprised of an iterative computation of these methods and statistical measures performed on a set of over 2000 samples. The refined labels assigned using this iterative approach revealed to be more consistent and in better agreement with clinicopathological markers and patients' overall survival than those originally provided by the PAM50 method.
The assignment of intrinsic subtypes has a significant impact in translational research for both understanding and managing breast cancer. The refined labelling, therefore, provides more accurate and reliable information by improving the source of fundamental science prior to clinical applications in medicine.
目前,多基因列表和单样本预测模型已被用于降低乳腺癌的多维复杂性,并识别内在亚型。然而,一些模型在处理高维数据挑战方面的明显不足,限制了这些亚型的准确特征描述。为了开发强大的策略,我们设计了一种迭代方法,以在METABRIC数据集中持续区分内在亚型并改善分类预测。
在本研究中,我们使用CM1评分来识别每组中最具区分性的探针,并采用集成学习技术,使用24种不同的分类器来评估这些探针在分配亚型标签方面的能力。我们的分析包括对这些方法的迭代计算以及对一组超过2000个样本进行的统计测量。与最初由PAM50方法提供的标签相比,使用这种迭代方法分配的细化标签显示出更一致,并且与临床病理标志物和患者的总生存期更相符。
内在亚型的分配在乳腺癌的理解和管理的转化研究中具有重大影响。因此,细化标签通过在医学临床应用之前改善基础科学来源,提供了更准确和可靠的信息。