Suppr超能文献

基于机器学习的甲基化数据分析特征降维方法在癌症组织起源分类中的应用。

A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin.

机构信息

Department of Genome Biology, Faculty of Medicine, Kindai University, Ohnohigashi 377-2, Osaka-Sayama, 589-9511, Japan.

Department of Medical Oncology, Faculty of Medicine, Kindai University, Osaka-Sayama, Japan.

出版信息

Int J Clin Oncol. 2024 Dec;29(12):1795-1810. doi: 10.1007/s10147-024-02617-w. Epub 2024 Sep 18.

Abstract

BACKGROUND

Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.

METHODS

Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.

RESULTS

This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.

CONCLUSIONS

Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.

摘要

背景

基因组 DNA 甲基化分析是一种有前途但昂贵的癌症分类方法,涉及大量数据。我们开发了一个集成学习模型,使用来自有限数量 CpG 位点的甲基化谱来识别癌症类型。

方法

分析 TCGA 数据库中 10 种癌症类型的 890 个样本的甲基化数据,我们利用方差分析和增益比选择最显著的 CpG 位点,然后利用梯度提升将其减少到仅 100 个位点。

结果

这种方法在多种机器学习模型中保持了较高的准确性,包括极端梯度提升、CatBoost 和随机森林在内的方法的分类准确率在 87.7%到 93.5%之间。这种方法有效地最小化了所需特征的数量,而不会降低性能,有助于对原发性器官进行分类,并揭示特定癌症(如乳腺癌和肺癌)中的亚组。

结论

使用梯度提升特征选择器显示出简化基于甲基化的癌症分类的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd6e/11588780/17d6d9fb191d/10147_2024_2617_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验