利用基于组织起源的DNA甲基化谱通过机器学习方法对原发性和转移性癌症进行分类

Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles.

作者信息

Modhukur Vijayachitra, Sharma Shakshi, Mondal Mainak, Lawarde Ankita, Kask Keiu, Sharma Rajesh, Salumets Andres

机构信息

Competence Centre on Health Technologies, 50411 Tartu, Estonia.

Department of Obstetrics and Gynecology, Institute of Clinical Medicine, University of Tartu, 50406 Tartu, Estonia.

出版信息

Cancers (Basel). 2021 Jul 27;13(15):3768. doi: 10.3390/cancers13153768.

DOI:10.3390/cancers13153768

PMID:34359669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8345047/

Abstract

Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types.

摘要

转移性癌症占癌症相关死亡人数的90%。将转移性癌症与原发性癌症明确区分对于癌症类型识别以及针对每种癌症类型制定靶向治疗至关重要。DNA甲基化模式被认为是癌症预测的一个有趣靶点，也被视为向转移性癌症转变的重要介导因素。在本研究中，我们使用了从公开可用数据存储库下载的24种癌症类型和9303个甲基化组样本，包括癌症基因组图谱（TCGA）和基因表达综合数据库（GEO）。我们构建了机器学习分类器来区分转移性、原发性和非癌性甲基化组样本。我们应用支持向量机（SVM）、朴素贝叶斯（NB）、极端梯度提升（XGBoost）和随机森林（RF）机器学习模型根据癌症的起源组织对癌症类型进行分类。随机森林的表现优于其他分类器，平均准确率达99%。此外，我们应用局部可解释模型无关解释（LIME）来解释用于癌症类型分类的重要甲基化生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1895/8345047/caebad70adf7/cancers-13-03768-g001.jpg

相似文献

Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles.利用基于组织起源的DNA甲基化谱通过机器学习方法对原发性和转移性癌症进行分类

Cancers (Basel). 2021 Jul 27;13(15):3768. doi: 10.3390/cancers13153768.

Explainable artificial intelligence model for identifying COVID-19 gene biomarkers.用于识别 COVID-19 基因生物标志物的可解释人工智能模型。

Comput Biol Med. 2023 Mar;154:106619. doi: 10.1016/j.compbiomed.2023.106619. Epub 2023 Feb 1.

Deep learning and machine learning approaches to classify stomach distant metastatic tumors using DNA methylation profiles.深度学习和机器学习方法，利用 DNA 甲基化谱对胃部远处转移瘤进行分类。

Comput Biol Med. 2024 Jun;175:108496. doi: 10.1016/j.compbiomed.2024.108496. Epub 2024 Apr 22.

Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma.采用机器学习方法鉴定食管鳞癌的关键预后分子。

BMC Cancer. 2021 Aug 9;21(1):906. doi: 10.1186/s12885-021-08647-1.

Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin.评估DNA甲基化、基因表达、体细胞突变及其组合在推断肿瘤组织起源中的作用。

Front Cell Dev Biol. 2021 May 3;9:619330. doi: 10.3389/fcell.2021.619330. eCollection 2021.

A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin.基于机器学习的甲基化数据分析特征降维方法在癌症组织起源分类中的应用。

Int J Clin Oncol. 2024 Dec;29(12):1795-1810. doi: 10.1007/s10147-024-02617-w. Epub 2024 Sep 18.

Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease.基于多模态时间序列数据的可解释机器学习模型用于帕金森病的早期检测。

Comput Methods Programs Biomed. 2023 Jun;234:107495. doi: 10.1016/j.cmpb.2023.107495. Epub 2023 Mar 23.

Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer.基于层次分类的泛癌甲基化分析对原发性癌症进行分类。

BMC Bioinformatics. 2023 Dec 8;24(1):465. doi: 10.1186/s12859-023-05529-0.

An Explainable Artificial Intelligence Framework for the Deterioration Risk Prediction of Hepatitis Patients.用于预测肝炎患者恶化风险的可解释人工智能框架。

J Med Syst. 2021 Apr 13;45(5):61. doi: 10.1007/s10916-021-01736-5.

Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer.应用机器学习方法预测食管癌患者的5年生存状况。

J Thorac Dis. 2021 Nov;13(11):6240-6251. doi: 10.21037/jtd-21-1107.

引用本文的文献

cfMethylPre: deep transfer learning enhances cancer detection based on circulating cell-free DNA methylation profiling.cfMethylPre：深度迁移学习基于循环游离DNA甲基化谱分析增强癌症检测。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf303.

Artificial Intelligence in cancer epigenomics: a review on advances in pan-cancer detection and precision medicine.癌症表观基因组学中的人工智能：泛癌检测与精准医学进展综述

Epigenetics Chromatin. 2025 Jun 14;18(1):35. doi: 10.1186/s13072-025-00595-5.

Mutational landscape and DNA methylation-based classification of squamous cell carcinoma and urothelial carcinoma.鳞状细胞癌和尿路上皮癌基于突变图谱和DNA甲基化的分类

Clin Epigenetics. 2025 Jun 8;17(1):95. doi: 10.1186/s13148-025-01902-3.

Tumor tissue-of-origin classification using miRNA-mRNA-lncRNA interaction networks and machine learning methods.使用miRNA-mRNA-lncRNA相互作用网络和机器学习方法进行肿瘤组织起源分类。

Front Bioinform. 2025 May 6;5:1571476. doi: 10.3389/fbinf.2025.1571476. eCollection 2025.

Accurate identification of primary site in tumors of unknown origin (TUO) using DNA methylation.利用DNA甲基化准确识别不明原发灶肿瘤（TUO）的原发部位。

NPJ Precis Oncol. 2025 Jan 10;9(1):8. doi: 10.1038/s41698-025-00805-z.

Evaluation of agreement between common clustering strategies for DNA methylation-based subtyping of breast tumours.基于DNA甲基化的乳腺肿瘤亚型分类常见聚类策略之间的一致性评估。

Epigenomics. 2025 Feb;17(2):105-114. doi: 10.1080/17501911.2024.2441653. Epub 2024 Dec 23.

Gut Microbiota as Mediator and Moderator Between Hepatitis B Virus and Hepatocellular Carcinoma: A Prospective Study.肠道微生物群作为乙型肝炎病毒与肝细胞癌之间的介导者和调节者：一项前瞻性研究

Cancer Med. 2024 Dec;13(24):e70454. doi: 10.1002/cam4.70454.

PathMethy: an interpretable AI framework for cancer origin tracing based on DNA methylation.PathMethy：一种基于 DNA 甲基化的癌症起源追踪可解释 AI 框架。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae497.

Stanniocalcin Protein Expression in Female Reproductive Organs: Literature Review and Public Cancer Database Analysis.Stanniocalcin 蛋白在女性生殖器官中的表达：文献综述与公共癌症数据库分析。

Endocrinology. 2024 Aug 27;165(10). doi: 10.1210/endocr/bqae110.

Early detection and diagnosis of cancer with interpretable machine learning to uncover cancer-specific DNA methylation patterns.利用可解释的机器学习进行癌症的早期检测和诊断，以发现癌症特异性DNA甲基化模式。

Biol Methods Protoc. 2024 Jun 20;9(1):bpae028. doi: 10.1093/biomethods/bpae028. eCollection 2024.

本文引用的文献

Comparative study of classifiers for human microbiome data.人类微生物组数据分类器的比较研究

Med Microecol. 2020 Jun;4. doi: 10.1016/j.medmic.2020.100013. Epub 2020 May 11.

Discriminating Origin Tissues of Tumor Cell Lines by Methylation Signatures and Dys-Methylated Rules.通过甲基化特征和异常甲基化规则鉴别肿瘤细胞系的起源组织

Front Bioeng Biotechnol. 2020 May 26;8:507. doi: 10.3389/fbioe.2020.00507. eCollection 2020.

Visualizing and interpreting cancer genomics data via the Xena platform.通过Xena平台可视化和解读癌症基因组学数据。

Nat Biotechnol. 2020 Jun;38(6):675-678. doi: 10.1038/s41587-020-0546-8.

Minimalist approaches to cancer tissue-of-origin classification by DNA methylation.基于 DNA 甲基化的癌症组织起源分类的简约方法。

Mod Pathol. 2020 Oct;33(10):1874-1888. doi: 10.1038/s41379-020-0547-7. Epub 2020 May 15.

Predicting cancer origins with a DNA methylation-based deep neural network model.使用基于DNA甲基化的深度神经网络模型预测癌症起源

PLoS One. 2020 May 8;15(5):e0226461. doi: 10.1371/journal.pone.0226461. eCollection 2020.

Machine learning and clinical epigenetics: a review of challenges for diagnosis and classification.机器学习与临床表观遗传学：诊断与分类挑战述评。

Clin Epigenetics. 2020 Apr 3;12(1):51. doi: 10.1186/s13148-020-00842-4.

Comprehensive longitudinal study of epigenetic mutations in aging.衰老过程中表观遗传突变的综合纵向研究。

Clin Epigenetics. 2019 Dec 9;11(1):187. doi: 10.1186/s13148-019-0788-9.

A panel of DNA methylated markers predicts metastasis of pNM gastric carcinoma: a prospective cohort study.一组 DNA 甲基化标记物可预测 pNM 胃癌的转移：一项前瞻性队列研究。

Br J Cancer. 2019 Oct;121(7):529-536. doi: 10.1038/s41416-019-0552-0. Epub 2019 Aug 21.

Longitudinal study of leukocyte DNA methylation and biomarkers for cancer risk in older adults.老年人白细胞DNA甲基化与癌症风险生物标志物的纵向研究。

Biomark Res. 2019 May 28;7:10. doi: 10.1186/s40364-019-0161-3. eCollection 2019.

The Role of HOX Transcription Factors in Cancer Predisposition and Progression.HOX转录因子在癌症易感性和进展中的作用。

Cancers (Basel). 2019 Apr 12;11(4):528. doi: 10.3390/cancers11040528.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用基于组织起源的DNA甲基化谱通过机器学习方法对原发性和转移性癌症进行分类

Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献