• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AutoDC:一种用于疾病分类的自动化机器学习框架。

AutoDC: an automatic machine learning framework for disease classification.

机构信息

Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University, Beijing, China.

Institute of Computational Social Science, Peking University (Qingdao), Qingdao, China.

出版信息

Bioinformatics. 2022 Jun 27;38(13):3415-3421. doi: 10.1093/bioinformatics/btac334.

DOI:10.1093/bioinformatics/btac334
PMID:35583303
Abstract

MOTIVATION

The emergence of next-generation sequencing techniques opens up tremendous opportunities for researchers to uncover the basic mechanisms of disease at the molecular level. Recently, automatic machine learning (AutoML) frameworks have been employed for genomic and epigenomic data analysis. However, to analyze those high-dimensional data, existing AutoML frameworks suffer from the following issues: (i) they could not effectively filter out the redundant features from the original data, and (ii) they usually obey the rule of feature engineering first and algorithm hyper-parameter tuning later to build the machine learning pipeline, which could lead to sub-optimal outcomes. Thus, it is an urgent need to design a new AutoML framework for high-dimensional omics data analysis.

RESULTS

We introduce a new method: AutoDC, a tailored AutoML framework, for different disease classification based on gene expression data. AutoDC designs two novel optimization strategies to improve the performance. One is that AutoDC designs a novel two-stage feature selection method to select the features with high gene contribution scores. The other is that AutoDC proposes a novel optimization method, based on a two-layer Multi-Armed Bandit framework, to jointly optimize the feature engineering, algorithm selection and algorithm hyper-parameter tuning. We apply our framework to two public gene expression datasets. Compared with three state-of-the-art AutoML frameworks, AutoDC could effectively classify diseases with higher predictive accuracy.

AVAILABILITY AND IMPLEMENTATION

The data and codes of AutoDC are available at https://github.com/dingdian110/AutoDC. The data underlying this article are available in the article and in its online supplementary material.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

下一代测序技术的出现为研究人员提供了巨大的机会,可以从分子水平揭示疾病的基本机制。最近,自动机器学习 (AutoML) 框架已被用于基因组和表观基因组数据分析。然而,为了分析这些高维数据,现有的 AutoML 框架存在以下问题:(i) 它们无法有效地从原始数据中筛选出冗余特征,(ii) 它们通常遵循特征工程优先和算法超参数调整后构建机器学习管道的规则,这可能导致次优结果。因此,迫切需要为高维组学数据分析设计新的 AutoML 框架。

结果

我们引入了一种新的方法:AutoDC,一种针对基于基因表达数据的不同疾病分类的定制化 AutoML 框架。AutoDC 设计了两种新颖的优化策略来提高性能。一种是 AutoDC 设计了一种新颖的两阶段特征选择方法来选择具有高基因贡献分数的特征。另一种是 AutoDC 提出了一种新颖的优化方法,基于两层多臂老虎机框架,共同优化特征工程、算法选择和算法超参数调整。我们将我们的框架应用于两个公共基因表达数据集。与三个最先进的 AutoML 框架相比,AutoDC 可以有效地以更高的预测准确性对疾病进行分类。

可用性和实现

AutoDC 的数据和代码可在 https://github.com/dingdian110/AutoDC 上获得。本文的数据可在文章和其在线补充材料中获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
AutoDC: an automatic machine learning framework for disease classification.AutoDC:一种用于疾病分类的自动化机器学习框架。
Bioinformatics. 2022 Jun 27;38(13):3415-3421. doi: 10.1093/bioinformatics/btac334.
2
Scaling tree-based automated machine learning to biomedical big data with a feature set selector.使用特征集选择器将基于树的自动化机器学习扩展到生物医学大数据。
Bioinformatics. 2020 Jan 1;36(1):250-256. doi: 10.1093/bioinformatics/btz470.
3
BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria.生物自动化机器学习:自动化特征工程和元学习,用于预测细菌中的非编码 RNA。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac218.
4
PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins.PeNGaRoo,一种组合梯度提升和集成学习框架,用于预测非经典分泌蛋白。
Bioinformatics. 2020 Feb 1;36(3):704-712. doi: 10.1093/bioinformatics/btz629.
5
parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.parSMURF,一种用于全基因组致病性变异检测的高性能计算工具。
Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa052.
6
The promise of automated machine learning for the genetic analysis of complex traits.自动化机器学习在复杂性状的遗传分析中的应用前景。
Hum Genet. 2022 Sep;141(9):1529-1544. doi: 10.1007/s00439-021-02393-x. Epub 2021 Oct 28.
7
DBFE: distribution-based feature extraction from structural variants in whole-genome data.DBFE:从全基因组数据中的结构变异进行基于分布的特征提取。
Bioinformatics. 2022 Sep 30;38(19):4466-4473. doi: 10.1093/bioinformatics/btac513.
8
Fast and interpretable genomic data analysis using multiple approximate kernel learning.使用多种近似核学习进行快速且可解释的基因组数据分析。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i77-i83. doi: 10.1093/bioinformatics/btac241.
9
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare.自动化机器学习:最新技术综述及医疗保健领域的机遇
Artif Intell Med. 2020 Apr;104:101822. doi: 10.1016/j.artmed.2020.101822. Epub 2020 Feb 21.
10
Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses.基于树的自动化机器学习中嵌入协变量调整,用于生物医学大数据分析。
BMC Bioinformatics. 2020 Oct 1;21(1):430. doi: 10.1186/s12859-020-03755-4.

引用本文的文献

1
π-HuB: the proteomic navigator of the human body.π-人蛋白质组浏览器:人体蛋白质组导航器
Nature. 2024 Dec;636(8042):322-331. doi: 10.1038/s41586-024-08280-5. Epub 2024 Dec 11.
2
MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets.MEvA-X:一种混合多目标进化工具,使用 XGBoost 分类器在生物医学数据集上发现生物标志物。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad384.
3
Lupus nephritis or not? A simple and clinically friendly machine learning pipeline to help diagnosis of lupus nephritis.
狼疮性肾炎还是没有?一个简单且临床友好的机器学习管道,帮助诊断狼疮性肾炎。
Inflamm Res. 2023 Jun;72(6):1315-1324. doi: 10.1007/s00011-023-01755-7. Epub 2023 Jun 10.