• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过机器学习模型进行可扩展且可解释的疾病预测的高维生物标志物识别

High-dimensional Biomarker Identification for Scalable and Interpretable Disease Prediction via Machine Learning Models.

作者信息

Dai Yifan, Zou Fei, Zou Baiming

机构信息

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

出版信息

bioRxiv. 2024 Oct 7:2024.10.04.616748. doi: 10.1101/2024.10.04.616748.

DOI:10.1101/2024.10.04.616748
PMID:39416165
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11482776/
Abstract

Omics data generated from high-throughput technologies and clinical features jointly impact many complex human diseases. Identifying key biomarkers and clinical risk factors is essential for understanding disease mechanisms and advancing early disease diagnosis and precision medicine. However, the high-dimensionality and intricate associations between disease outcomes and omics profiles present significant analytical challenges. To address these, we propose an ensemble data-driven biomarker identification tool, Hybrid Feature Screening (HFS), to construct a candidate feature set for downstream advanced machine learning models. The pre-screened candidate features from HFS are further refined using a computationally efficient permutation-based feature importance test, forming the comprehensive High-dimensional Feature Importance Test (HiFIT) framework. Through extensive numerical simulations and real-world applications, we demonstrate HiFIT's superior performance in both outcome prediction and feature importance identification. An R package implementing HiFIT is available on GitHub (https://github.com/BZou-lab/HiFIT).

摘要

高通量技术生成的组学数据和临床特征共同影响着许多复杂的人类疾病。识别关键生物标志物和临床风险因素对于理解疾病机制、推进疾病早期诊断和精准医学至关重要。然而,疾病结局与组学特征之间的高维度和复杂关联带来了重大的分析挑战。为解决这些问题,我们提出了一种集成数据驱动的生物标志物识别工具——混合特征筛选(HFS),以构建用于下游先进机器学习模型的候选特征集。来自HFS的预筛选候选特征使用基于计算效率高的置换的特征重要性测试进一步优化,形成了全面的高维特征重要性测试(HiFIT)框架。通过广泛的数值模拟和实际应用,我们证明了HiFIT在结局预测和特征重要性识别方面的卓越性能。一个实现HiFIT的R包可在GitHub上获取(https://github.com/BZou-lab/HiFIT)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/28269044f5f5/nihpp-2024.10.04.616748v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/6f121e22f3da/nihpp-2024.10.04.616748v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/53331d2cf36e/nihpp-2024.10.04.616748v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/0e60c9053801/nihpp-2024.10.04.616748v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/10f1d79538ee/nihpp-2024.10.04.616748v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/c461f391886a/nihpp-2024.10.04.616748v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/28269044f5f5/nihpp-2024.10.04.616748v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/6f121e22f3da/nihpp-2024.10.04.616748v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/53331d2cf36e/nihpp-2024.10.04.616748v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/0e60c9053801/nihpp-2024.10.04.616748v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/10f1d79538ee/nihpp-2024.10.04.616748v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/c461f391886a/nihpp-2024.10.04.616748v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66cc/11482776/28269044f5f5/nihpp-2024.10.04.616748v1-f0006.jpg

相似文献

1
High-dimensional Biomarker Identification for Scalable and Interpretable Disease Prediction via Machine Learning Models.通过机器学习模型进行可扩展且可解释的疾病预测的高维生物标志物识别
bioRxiv. 2024 Oct 7:2024.10.04.616748. doi: 10.1101/2024.10.04.616748.
2
High-dimensional biomarker identification for interpretable disease prediction via machine learning models.通过机器学习模型进行可解释疾病预测的高维生物标志物识别
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf266.
3
A deep learning feature importance test framework for integrating informative high-dimensional biomarkers to improve disease outcome prediction.一种用于整合信息丰富的高维生物标志物以改善疾病预后预测的深度学习特征重要性测试框架。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae709.
4
Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets.在整合的平行多组学数据集中识别1型糖尿病复合生物标志物特征的监督参数学习
Biomedicines. 2024 Feb 22;12(3):492. doi: 10.3390/biomedicines12030492.
5
Permutation-based identification of important biomarkers for complex diseases via machine learning models.基于排列的机器学习模型识别复杂疾病的重要生物标志物。
Nat Commun. 2021 May 21;12(1):3008. doi: 10.1038/s41467-021-22756-2.
6
Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data.基于高维基因组数据的疾病风险预测可解释深度迁移学习模型。
PLoS Comput Biol. 2022 Jul 15;18(7):e1010328. doi: 10.1371/journal.pcbi.1010328. eCollection 2022 Jul.
7
DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery.DeepKEGG:一个具有生物学见解的多组学数据集成框架,可用于癌症复发预测和生物标志物发现。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae185.
8
Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。
Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.
9
Longitudinal Microbiome-based Interpretable Machine Learning for Identification of Time-Varying Biomarkers in Early Prediction of Disease Outcomes.基于纵向微生物组的可解释机器学习用于在疾病结局早期预测中识别随时间变化的生物标志物。
bioRxiv. 2024 Nov 20:2024.10.18.619118. doi: 10.1101/2024.10.18.619118.
10
Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data.用于高维组学数据中生物标志物发现的大规模自动特征选择
Front Genet. 2019 May 16;10:452. doi: 10.3389/fgene.2019.00452. eCollection 2019.

本文引用的文献

1
PABPC1 promotes cell proliferation and metastasis in pancreatic adenocarcinoma by regulating COL12A1 expression.PABPC1 通过调节 COL12A1 的表达促进胰腺腺癌的增殖和转移。
Immun Inflamm Dis. 2023 Jul;11(7):e919. doi: 10.1002/iid3.919.
2
Comprehensive Pan-Cancer Analysis of KIF18A as a Marker for Prognosis and Immunity.全面泛癌分析 KIF18A 作为预后和免疫标志物。
Biomolecules. 2023 Feb 8;13(2):326. doi: 10.3390/biom13020326.
3
DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data.
DeepProg:一种使用多组学数据进行预后预测的深度学习和机器学习模型的集成。
Genome Med. 2021 Jul 14;13(1):112. doi: 10.1186/s13073-021-00930-x.
4
A microbial signature following bariatric surgery is robustly consistent across multiple cohorts.减重手术后存在微生物特征,其在多个队列中具有很强的一致性。
Gut Microbes. 2021 Jan-Dec;13(1):1930872. doi: 10.1080/19490976.2021.1930872.
5
Permutation-based identification of important biomarkers for complex diseases via machine learning models.基于排列的机器学习模型识别复杂疾病的重要生物标志物。
Nat Commun. 2021 May 21;12(1):3008. doi: 10.1038/s41467-021-22756-2.
6
Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析:综述
Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.
7
Chromosomally unstable tumor cells specifically require KIF18A for proliferation.染色体不稳定的肿瘤细胞特别需要 KIF18A 来增殖。
Nat Commun. 2021 Feb 22;12(1):1213. doi: 10.1038/s41467-021-21447-2.
8
Identifying mechanisms that predict weight trajectory after bariatric surgery: rationale and design of the biobehavioral trial.确定减重手术后体重变化轨迹的预测机制:生物行为试验的原理和设计。
Surg Obes Relat Dis. 2020 Nov;16(11):1816-1826. doi: 10.1016/j.soard.2020.06.020. Epub 2020 Jun 20.
9
The Application of Deep Learning in Cancer Prognosis Prediction.深度学习在癌症预后预测中的应用。
Cancers (Basel). 2020 Mar 5;12(3):603. doi: 10.3390/cancers12030603.
10
Efficient Signal Inclusion With Genomic Applications.基因组应用中的高效信号包含
J Am Stat Assoc. 2019;114(528):1787-1799. doi: 10.1080/01621459.2018.1518236. Epub 2019 Feb 27.