Suppr超能文献

利用基于组织起源的DNA甲基化谱通过机器学习方法对原发性和转移性癌症进行分类

Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles.

作者信息

Modhukur Vijayachitra, Sharma Shakshi, Mondal Mainak, Lawarde Ankita, Kask Keiu, Sharma Rajesh, Salumets Andres

机构信息

Competence Centre on Health Technologies, 50411 Tartu, Estonia.

Department of Obstetrics and Gynecology, Institute of Clinical Medicine, University of Tartu, 50406 Tartu, Estonia.

出版信息

Cancers (Basel). 2021 Jul 27;13(15):3768. doi: 10.3390/cancers13153768.

Abstract

Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types.

摘要

转移性癌症占癌症相关死亡人数的90%。将转移性癌症与原发性癌症明确区分对于癌症类型识别以及针对每种癌症类型制定靶向治疗至关重要。DNA甲基化模式被认为是癌症预测的一个有趣靶点,也被视为向转移性癌症转变的重要介导因素。在本研究中,我们使用了从公开可用数据存储库下载的24种癌症类型和9303个甲基化组样本,包括癌症基因组图谱(TCGA)和基因表达综合数据库(GEO)。我们构建了机器学习分类器来区分转移性、原发性和非癌性甲基化组样本。我们应用支持向量机(SVM)、朴素贝叶斯(NB)、极端梯度提升(XGBoost)和随机森林(RF)机器学习模型根据癌症的起源组织对癌症类型进行分类。随机森林的表现优于其他分类器,平均准确率达99%。此外,我们应用局部可解释模型无关解释(LIME)来解释用于癌症类型分类的重要甲基化生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1895/8345047/caebad70adf7/cancers-13-03768-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验