College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China.
Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands.
Mol Genet Genomics. 2019 Feb;294(1):95-110. doi: 10.1007/s00438-018-1488-4. Epub 2018 Sep 10.
Breast cancer is a common and threatening malignant disease with multiple biological and clinical subtypes. It can be categorized into subtypes of luminal A, luminal B, Her2 positive, and basal-like. Copy number variants (CNVs) have been reported to be a potential and even better biomarker for cancer diagnosis than mRNA biomarkers, because it is considerably more stable and robust than gene expression. Thus, it is meaningful to detect CNVs of different cancers. To identify the CNV biomarker for breast cancer subtypes, we integrated the CNV data of more than 2000 samples from two large breast cancer databases, METABRIC and The Cancer Genome Atlas (TCGA). A Monte Carlo feature selection-based and incremental feature selection-based computational method was proposed and tested to identify the distinctive core CNVs in different breast cancer subtypes. We identified the CNV genes that may contribute to breast cancer tumorigenesis as well as built a set of quantitative distinctive rules for recognition of the breast cancer subtypes. The tenfold cross-validation Matthew's correlation coefficient (MCC) on METABRIC training set and the independent test on TCGA dataset were 0.515 and 0.492, respectively. The CNVs of PGAP3, GRB7, MIR4728, PNMT, STARD3, TCAP and ERBB2 were important for the accurate diagnosis of breast cancer subtypes. The findings reported in this study may further uncover the difference between different breast cancer subtypes and improve the diagnosis accuracy.
乳腺癌是一种常见且具有威胁性的恶性疾病,具有多种生物学和临床亚型。它可以分为 luminal A、luminal B、Her2 阳性和基底样亚型。与 mRNA 生物标志物相比,拷贝数变异 (CNV) 已被报道为一种潜在的、甚至更好的癌症诊断生物标志物,因为它比基因表达更稳定和强大。因此,检测不同癌症的 CNV 具有重要意义。为了鉴定乳腺癌亚型的 CNV 生物标志物,我们整合了来自两个大型乳腺癌数据库 METABRIC 和癌症基因组图谱 (TCGA) 的 2000 多个样本的 CNV 数据。提出并测试了一种基于蒙特卡罗特征选择和基于增量特征选择的计算方法,以鉴定不同乳腺癌亚型中的独特核心 CNV。我们鉴定了可能有助于乳腺癌发生的 CNV 基因,并建立了一组用于识别乳腺癌亚型的定量特征规则。在 METABRIC 训练集上的 10 倍交叉验证 Matthew 相关系数 (MCC) 和 TCGA 数据集上的独立测试分别为 0.515 和 0.492。PGAP3、GRB7、MIR4728、PNMT、STARD3、TCAP 和 ERBB2 的 CNV 对乳腺癌亚型的准确诊断很重要。本研究中的发现可能进一步揭示不同乳腺癌亚型之间的差异,提高诊断准确性。