• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的甲基化数据分析特征降维方法在癌症组织起源分类中的应用。

A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin.

机构信息

Department of Genome Biology, Faculty of Medicine, Kindai University, Ohnohigashi 377-2, Osaka-Sayama, 589-9511, Japan.

Department of Medical Oncology, Faculty of Medicine, Kindai University, Osaka-Sayama, Japan.

出版信息

Int J Clin Oncol. 2024 Dec;29(12):1795-1810. doi: 10.1007/s10147-024-02617-w. Epub 2024 Sep 18.

DOI:10.1007/s10147-024-02617-w
PMID:39292320
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11588780/
Abstract

BACKGROUND

Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.

METHODS

Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.

RESULTS

This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.

CONCLUSIONS

Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.

摘要

背景

基因组 DNA 甲基化分析是一种有前途但昂贵的癌症分类方法,涉及大量数据。我们开发了一个集成学习模型,使用来自有限数量 CpG 位点的甲基化谱来识别癌症类型。

方法

分析 TCGA 数据库中 10 种癌症类型的 890 个样本的甲基化数据,我们利用方差分析和增益比选择最显著的 CpG 位点,然后利用梯度提升将其减少到仅 100 个位点。

结果

这种方法在多种机器学习模型中保持了较高的准确性,包括极端梯度提升、CatBoost 和随机森林在内的方法的分类准确率在 87.7%到 93.5%之间。这种方法有效地最小化了所需特征的数量,而不会降低性能,有助于对原发性器官进行分类,并揭示特定癌症(如乳腺癌和肺癌)中的亚组。

结论

使用梯度提升特征选择器显示出简化基于甲基化的癌症分类的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/58c4488a8883/10147_2024_2617_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/17d6d9fb191d/10147_2024_2617_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/530dd6b0edc0/10147_2024_2617_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/cb0d69123d1d/10147_2024_2617_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/a870476ff3cc/10147_2024_2617_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/6a5f4b80847f/10147_2024_2617_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/58c4488a8883/10147_2024_2617_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/17d6d9fb191d/10147_2024_2617_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/530dd6b0edc0/10147_2024_2617_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/cb0d69123d1d/10147_2024_2617_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/a870476ff3cc/10147_2024_2617_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/6a5f4b80847f/10147_2024_2617_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/58c4488a8883/10147_2024_2617_Fig6_HTML.jpg

相似文献

1
A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin.基于机器学习的甲基化数据分析特征降维方法在癌症组织起源分类中的应用。
Int J Clin Oncol. 2024 Dec;29(12):1795-1810. doi: 10.1007/s10147-024-02617-w. Epub 2024 Sep 18.
2
Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles.从基因组学和表观基因组学特征对早期和晚期肝癌患者进行分类。
PLoS One. 2019 Sep 6;14(9):e0221476. doi: 10.1371/journal.pone.0221476. eCollection 2019.
3
Using epigenomics data to predict gene expression in lung cancer.利用表观基因组学数据预测肺癌中的基因表达。
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S10. doi: 10.1186/1471-2105-16-S5-S10. Epub 2015 Mar 18.
4
DNA Methylation Markers for Pan-Cancer Prediction by Deep Learning.基于深度学习的泛癌预测 DNA 甲基化标志物。
Genes (Basel). 2019 Oct 4;10(10):778. doi: 10.3390/genes10100778.
5
Diagnostic classification based on DNA methylation profiles using sequential machine learning approaches.基于 DNA 甲基化谱的诊断分类,使用序贯机器学习方法。
PLoS One. 2024 Sep 6;19(9):e0307912. doi: 10.1371/journal.pone.0307912. eCollection 2024.
6
Development and validation of a machine learning prognostic model based on an epigenomic signature in patients with pancreatic ductal adenocarcinoma.基于表观基因组特征的胰腺癌患者机器学习预后模型的开发与验证
Int J Med Inform. 2025 Jul;199:105883. doi: 10.1016/j.ijmedinf.2025.105883. Epub 2025 Mar 22.
7
Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles.利用基于组织起源的DNA甲基化谱通过机器学习方法对原发性和转移性癌症进行分类
Cancers (Basel). 2021 Jul 27;13(15):3768. doi: 10.3390/cancers13153768.
8
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
9
Diagnostic classification of cancers using DNA methylation of paracancerous tissues.利用癌旁组织的 DNA 甲基化进行癌症的诊断分类。
Sci Rep. 2022 Jun 23;12(1):10646. doi: 10.1038/s41598-022-14786-7.
10
Identification of a small optimal subset of CpG sites as bio-markers from high-throughput DNA methylation profiles.从高通量DNA甲基化谱中鉴定出一小部分作为生物标志物的最佳CpG位点子集。
BMC Bioinformatics. 2008 Oct 27;9:457. doi: 10.1186/1471-2105-9-457.

本文引用的文献

1
Baseline mutational profiles of patients with carcinoma of unknown primary origin enrolled in the CUPISCO study.在 CUPISCO 研究中入组的不明原发部位癌患者的基线突变特征。
ESMO Open. 2023 Dec;8(6):102035. doi: 10.1016/j.esmoop.2023.102035. Epub 2023 Nov 2.
2
DNA methylation-based classifier differentiates intrahepatic pancreato-biliary tumours.基于 DNA 甲基化的分类器可区分肝内胰胆管肿瘤。
EBioMedicine. 2023 Jul;93:104657. doi: 10.1016/j.ebiom.2023.104657. Epub 2023 Jun 21.
3
A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction.
基于机器学习的疾病风险预测的特征选择方法综述
Front Bioinform. 2022 Jun 27;2:927312. doi: 10.3389/fbinf.2022.927312. eCollection 2022.
4
Comprehensive genomic and epigenomic analysis in cancer of unknown primary guides molecularly-informed therapies despite heterogeneity.全面的基因组和表观基因组分析在不明原发癌症中指导了基于分子特征的治疗,尽管存在异质性。
Nat Commun. 2022 Aug 2;13(1):4485. doi: 10.1038/s41467-022-31866-4.
5
DNA methylation variation along the cancer epigenome and the identification of novel epigenetic driver events.沿着癌症表观基因组的 DNA 甲基化变异和新型表观遗传驱动事件的鉴定。
Nucleic Acids Res. 2021 Dec 16;49(22):12692-12705. doi: 10.1093/nar/gkab1167.
6
DNA methylation landscapes of 1538 breast cancers reveal a replication-linked clock, epigenomic instability and cis-regulation.1538 例乳腺癌的 DNA 甲基化图谱揭示了与复制相关的时钟、表观基因组不稳定性和顺式调控。
Nat Commun. 2021 Sep 13;12(1):5406. doi: 10.1038/s41467-021-25661-w.
7
Integrative analysis of gut microbiome and host transcriptomes reveals associations between treatment outcomes and immunotherapy-induced colitis.肠道微生物组和宿主转录组的综合分析揭示了治疗结果与免疫治疗诱导的结肠炎之间的关联。
Mol Oncol. 2022 Apr;16(7):1493-1507. doi: 10.1002/1878-0261.13062. Epub 2021 Jul 28.
8
Navigating the DNA methylation landscape of cancer.解析癌症 DNA 甲基化图谱。
Trends Genet. 2021 Nov;37(11):1012-1027. doi: 10.1016/j.tig.2021.05.002. Epub 2021 Jun 10.
9
Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin.评估DNA甲基化、基因表达、体细胞突变及其组合在推断肿瘤组织起源中的作用。
Front Cell Dev Biol. 2021 May 3;9:619330. doi: 10.3389/fcell.2021.619330. eCollection 2021.
10
Machine Learning: Algorithms, Real-World Applications and Research Directions.机器学习:算法、实际应用与研究方向。
SN Comput Sci. 2021;2(3):160. doi: 10.1007/s42979-021-00592-x. Epub 2021 Mar 22.