使用机器学习方法整合体细胞突变以预测乳腺癌生存情况

Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods.

作者信息

He Zongzhen, Zhang Junying, Yuan Xiguo, Zhang Yuanyuan

机构信息

School of Computer Science and Technology, Xidian University, Xi'an, China.

School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China.

出版信息

Front Genet. 2021 Jan 18;11:632901. doi: 10.3389/fgene.2020.632901. eCollection 2020.

DOI:10.3389/fgene.2020.632901

PMID:33537063

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7848170/

Abstract

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.

摘要

乳腺癌是女性中最常见的恶性肿瘤，由于其死亡率高，因此迫切需要开发计算方法以提高乳腺癌生存预测模型的准确性。尽管诸如基因表达等多组学数据在最近的研究中已被广泛使用，但乳腺癌的准确预后仍然是一个挑战。体细胞突变是研究癌症发展的另一个重要且有前景的数据来源，其对乳腺癌预后的影响仍有待进一步探索。同时，这些组学数据集具有高维度和冗余性。因此，我们采用多核学习（MKL）来有效地将体细胞突变与当前的分子数据（包括基因表达、拷贝数变异（CNV）、甲基化和蛋白质表达数据）整合起来，以预测乳腺癌的生存情况。在整合之前，利用最大相关最小冗余（mRMR）特征选择方法为每种类型的数据选择与生存高度相关且彼此之间冗余度低的特征。实验结果表明，该方法实现了最优性能，并且纳入体细胞突变后预测性能有显著提高，这表明体细胞突变对于改善乳腺癌生存预测至关重要。此外，mRMR优于先前研究中使用的其他特征选择方法。此外，在多组学数据整合方面，MKL优于其他传统分类器。我们的分析表明，通过采用体细胞突变等有前景的组学数据，并利用适当的特征选择方法和有效的整合框架，可以进一步提高乳腺癌生存预测的准确性，从而为乳腺癌患者提供更优化的临床诊断和更有效的治疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a547/7848170/8fe95462e453/fgene-11-632901-g001.jpg

相似文献

Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods.使用机器学习方法整合体细胞突变以预测乳腺癌生存情况

Front Genet. 2021 Jan 18;11:632901. doi: 10.3389/fgene.2020.632901. eCollection 2020.

Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data.基于多组学数据预测卵巢癌生存的最小冗余最大相关性多视图特征选择。

BMC Med Genomics. 2018 Sep 14;11(Suppl 3):71. doi: 10.1186/s12920-018-0388-0.

Synergistic Effects of Different Levels of Genomic Data for the Staging of Lung Adenocarcinoma: An Illustrative Study.不同层次基因组数据对肺腺癌分期的协同作用：一项说明性研究。

Genes (Basel). 2021 Nov 24;12(12):1872. doi: 10.3390/genes12121872.

Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis.基于深度学习的多组学生物标志物数据特征层融合在乳腺癌患者生存分析中的应用。

BMC Med Inform Decis Mak. 2020 Sep 15;20(1):225. doi: 10.1186/s12911-020-01225-8.

The method for breast cancer grade prediction and pathway analysis based on improved multiple kernel learning.基于改进的多核学习的乳腺癌分级预测及通路分析方法

J Bioinform Comput Biol. 2017 Feb;15(1):1650037. doi: 10.1142/S0219720016500372. Epub 2016 Nov 29.

Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer.基于乳腺癌元维度组学数据间的相互作用预测删失生存数据。

J Biomed Inform. 2015 Aug;56:220-8. doi: 10.1016/j.jbi.2015.05.019. Epub 2015 Jun 3.

Integrating multi-omics data by learning modality invariant representations for improved prediction of overall survival of cancer.通过学习模态不变表示来整合多组学数据，以提高癌症总体生存预测的准确性。

Methods. 2021 May;189:74-85. doi: 10.1016/j.ymeth.2020.07.008. Epub 2020 Aug 5.

A Translational Pipeline for Overall Survival Prediction of Breast Cancer Patients by Decision-Level Integration of Multi-Omics Data.一种通过多组学数据的决策级整合对乳腺癌患者总生存期进行预测的转化流程。

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov;2019:1573-1580. doi: 10.1109/bibm47256.2019.8983243. Epub 2020 Feb 6.

Classifying breast cancer using multi-view graph neural network based on multi-omics data.基于多组学数据，使用多视图图神经网络对乳腺癌进行分类。

Front Genet. 2024 Feb 20;15:1363896. doi: 10.3389/fgene.2024.1363896. eCollection 2024.

Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome.整合基因组数据和病理图像，有效预测乳腺癌临床预后。

Comput Methods Programs Biomed. 2018 Jul;161:45-53. doi: 10.1016/j.cmpb.2018.04.008. Epub 2018 Apr 19.

引用本文的文献

Uncertainty quantification in multi-class image classification using chest X-ray images of COVID-19 and pneumonia.使用新冠肺炎和肺炎胸部X光图像进行多类别图像分类中的不确定性量化

Front Artif Intell. 2024 Sep 18;7:1410841. doi: 10.3389/frai.2024.1410841. eCollection 2024.

A survey on multi-omics-based cancer diagnosis using machine learning with the potential application in gastrointestinal cancer.一项关于使用机器学习进行基于多组学的癌症诊断及其在胃肠道癌中的潜在应用的调查。

Front Med (Lausanne). 2023 Jan 10;9:1109365. doi: 10.3389/fmed.2022.1109365. eCollection 2022.

Secure tumor classification by shallow neural network using homomorphic encryption.利用同态加密实现浅层神经网络的肿瘤分类安全。

BMC Genomics. 2022 Apr 9;23(1):284. doi: 10.1186/s12864-022-08469-w.

Integration strategies of multi-omics data for machine learning analysis.用于机器学习分析的多组学数据整合策略。

Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. doi: 10.1016/j.csbj.2021.06.030. eCollection 2021.

本文引用的文献

LDICDL: LncRNA-Disease Association Identification Based on Collaborative Deep Learning.LDICDL：基于协同深度学习的 lncRNA-疾病关联识别。

IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1715-1723. doi: 10.1109/TCBB.2020.3034910. Epub 2022 Jun 3.

Prediction of breast cancer proteins involved in immunotherapy, metastasis, and RNA-binding using molecular descriptors and artificial neural networks.利用分子描述符和人工神经网络预测参与免疫治疗、转移和 RNA 结合的乳腺癌蛋白。

Sci Rep. 2020 May 22;10(1):8515. doi: 10.1038/s41598-020-65584-y.

Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets.基于异质基因组规模分子数据集的癌症综合分型。

Pac Symp Biocomput. 2020;25:551-562.

Evaluation of colorectal cancer subtypes and cell lines using deep learning.基于深度学习的结直肠癌亚型和细胞系评估。

Life Sci Alliance. 2019 Dec 2;2(6). doi: 10.26508/lsa.201900517. Print 2019 Dec.

Gene mutation profiling in Chinese colorectal cancer patients and its association with clinicopathological characteristics and prognosis.中国结直肠癌患者的基因突变谱及其与临床病理特征和预后的关系。

Cancer Med. 2020 Jan;9(2):745-756. doi: 10.1002/cam4.2727. Epub 2019 Nov 28.

ILDMSF: Inferring Associations Between Long Non-Coding RNA and Disease Based on Multi-Similarity Fusion.ILDMSF：基于多相似度融合的长非编码 RNA 与疾病关联推断。

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):1106-1112. doi: 10.1109/TCBB.2019.2936476. Epub 2021 Jun 3.

Identification of potential key genes and pathways predicting pathogenesis and prognosis for triple-negative breast cancer.预测三阴性乳腺癌发病机制和预后的潜在关键基因及通路的鉴定

Cancer Cell Int. 2019 Jun 28;19:172. doi: 10.1186/s12935-019-0884-0. eCollection 2019.

CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data.CNV_IFTV：一种基于孤立森林和全变差的短读测序数据 CNV 检测方法。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):539-549. doi: 10.1109/TCBB.2019.2920889. Epub 2021 Apr 8.

LSCDFS-MKL: A multiple kernel based method for lung squamous cell carcinomas disease-free survival prediction with pathological and genomic data.LSCDFS-MKL：一种基于多核的方法，用于利用病理和基因组数据预测肺鳞状细胞癌无病生存期。

J Biomed Inform. 2019 Jun;94:103194. doi: 10.1016/j.jbi.2019.103194. Epub 2019 Apr 29.

FUN14 domain-containing 1 promotes breast cancer proliferation and migration by activating calcium-NFATC1-BMI1 axis.FUN14 结构域包含蛋白 1 通过激活钙-NFATC1-BMI1 轴促进乳腺癌的增殖和迁移。

EBioMedicine. 2019 Mar;41:384-394. doi: 10.1016/j.ebiom.2019.02.032. Epub 2019 Feb 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用机器学习方法整合体细胞突变以预测乳腺癌生存情况

Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献