• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习与生物医学大数据的综合分析。

Machine Learning and Integrative Analysis of Biomedical Big Data.

机构信息

NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.

Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Genes (Basel). 2019 Jan 28;10(2):87. doi: 10.3390/genes10020087.

DOI:10.3390/genes10020087
PMID:30696086
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6410075/
Abstract

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.

摘要

近年来,高通量技术的发展加速了来自多个来源的组学数据(基因组、表观基因组、转录组、蛋白质组、代谢组等)的大量积累。传统上,使用统计和机器学习(ML)方法分别分析来自每个源(例如基因组)的数据。多组学和临床数据的综合分析是新的生物医学发现和精准医学进展的关键。然而,数据集成不仅带来了新的计算挑战,还加剧了与单组学研究相关的挑战。需要专门的计算方法才能有效地对来自不同模式的生物医学数据进行综合分析。在这篇综述中,我们讨论了基于机器学习的最新方法,以解决综合分析中与五个具体计算挑战相关的问题:维度灾难、数据异质性、数据缺失、类别不平衡和可扩展性问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/7b886d37b6ab/genes-10-00087-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/a78ec29f850a/genes-10-00087-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/1eeb6d12be31/genes-10-00087-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/d19e638fb526/genes-10-00087-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/e7939b13db2c/genes-10-00087-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/7b886d37b6ab/genes-10-00087-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/a78ec29f850a/genes-10-00087-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/1eeb6d12be31/genes-10-00087-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/d19e638fb526/genes-10-00087-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/e7939b13db2c/genes-10-00087-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff50/6410075/7b886d37b6ab/genes-10-00087-g005.jpg

相似文献

1
Machine Learning and Integrative Analysis of Biomedical Big Data.机器学习与生物医学大数据的综合分析。
Genes (Basel). 2019 Jan 28;10(2):87. doi: 10.3390/genes10020087.
2
Integrative Analysis of Omics Big Data.组学大数据的综合分析
Methods Mol Biol. 2018;1754:109-135. doi: 10.1007/978-1-4939-7717-8_7.
3
Biomedical Big Data Technologies, Applications, and Challenges for Precision Medicine: A Review.生物医学大数据技术、精准医学中的应用及挑战:综述
Glob Chall. 2023 Nov 20;8(1):2300163. doi: 10.1002/gch2.202300163. eCollection 2024 Jan.
4
Machine learning: its challenges and opportunities in plant system biology.机器学习:在植物系统生物学中的挑战与机遇。
Appl Microbiol Biotechnol. 2022 May;106(9-10):3507-3530. doi: 10.1007/s00253-022-11963-6. Epub 2022 May 16.
5
Multi-omics data integration approaches for precision oncology.多组学数据整合方法在精准肿瘤学中的应用。
Mol Omics. 2022 Jul 11;18(6):469-479. doi: 10.1039/d1mo00411e.
6
Machine learning integrative approaches to advance computational immunology.机器学习综合方法推进计算免疫学。
Genome Med. 2024 Jun 11;16(1):80. doi: 10.1186/s13073-024-01350-3.
7
Machine learning meets omics: applications and perspectives.机器学习与组学的融合:应用与展望。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab460.
8
Revisit of Machine Learning Supported Biological and Biomedical Studies.机器学习支持的生物学和生物医学研究回顾
Methods Mol Biol. 2018;1754:183-204. doi: 10.1007/978-1-4939-7717-8_11.
9
Integrative methods for analyzing big data in precision medicine.精准医学中大数据分析的整合方法。
Proteomics. 2016 Mar;16(5):741-58. doi: 10.1002/pmic.201500396.
10
Towards multi-omics characterization of tumor heterogeneity: a comprehensive review of statistical and machine learning approaches.迈向肿瘤异质性的多组学特征分析:统计和机器学习方法的综合综述。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa188.

引用本文的文献

1
Increasing pathogenic germline variant diagnosis rates in precision medicine: current best practices and future opportunities.提高精准医学中致病种系变异的诊断率:当前最佳实践与未来机遇
Hum Genomics. 2025 Aug 22;19(1):97. doi: 10.1186/s40246-025-00811-z.
2
A review on multi-omics integration for aiding study design of large scale TCGA cancer datasets.关于多组学整合以辅助大规模TCGA癌症数据集研究设计的综述。
BMC Genomics. 2025 Aug 22;26(1):769. doi: 10.1186/s12864-025-11925-y.
3
Artificial Intelligence in Hypertrophic Cardiomyopathy: Advances, Challenges, and Future Directions for Personalized Risk Prediction and Management.

本文引用的文献

1
Synergistic Drug Combination Prediction by Integrating Multiomics Data in Deep Learning Models.基于深度学习模型整合多组学数据进行协同药物组合预测。
Methods Mol Biol. 2021;2194:223-238. doi: 10.1007/978-1-0716-0849-4_12.
2
Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing.隐私保护生成式深度神经网络支持临床数据共享。
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122. doi: 10.1161/CIRCOUTCOMES.118.005122. Epub 2019 Jul 9.
3
DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays.
肥厚型心肌病中的人工智能:个性化风险预测与管理的进展、挑战及未来方向
Cureus. 2025 Jul 14;17(7):e87907. doi: 10.7759/cureus.87907. eCollection 2025 Jul.
4
A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches.多组学数据整合方法的技术综述:从经典统计方法到深度生成方法
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf355.
5
Integrating multi-dimensional data to reveal the mechanisms and molecular targets of baikening granules for treatment of pediatric influenza.整合多维数据以揭示百咳宁颗粒治疗小儿流感的机制及分子靶点。
Front Mol Biosci. 2025 Jul 11;12:1637980. doi: 10.3389/fmolb.2025.1637980. eCollection 2025.
6
MIDAS: a technology-enabled hub-and-spoke system for the collection and dissemination of high-quality medical datasets in India.MIDAS:一种在印度用于收集和传播高质量医学数据集的技术支持的中心辐射式系统。
BMC Med Inform Decis Mak. 2025 Jul 6;25(1):252. doi: 10.1186/s12911-025-03092-7.
7
Decoding the hepatic fibrosis-hepatocellular carcinoma axis: from mechanisms to therapeutic opportunities.解读肝纤维化-肝细胞癌轴:从机制到治疗机遇
Hepatol Int. 2025 Jul 1. doi: 10.1007/s12072-025-10838-y.
8
Network-based analyses of multiomics data in biomedicine.生物医药中多组学数据的基于网络的分析。
BioData Min. 2025 May 27;18(1):37. doi: 10.1186/s13040-025-00452-x.
9
Navigating the Multiverse: a Hitchhiker's guide to selecting harmonization methods for multimodal biomedical data.探索多元宇宙:多模态生物医学数据协调方法选择指南
Biol Methods Protoc. 2025 Apr 17;10(1):bpaf028. doi: 10.1093/biomethods/bpaf028. eCollection 2025.
10
High-dimensional biomarker identification for interpretable disease prediction via machine learning models.通过机器学习模型进行可解释疾病预测的高维生物标志物识别
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf266.
DIABLO:一种从多组学分析中识别关键分子驱动因素的综合方法。
Bioinformatics. 2019 Sep 1;35(17):3055-3062. doi: 10.1093/bioinformatics/bty1054.
4
AutoImpute: Autoencoder based imputation of single-cell RNA-seq data.AutoImpute:基于自动编码器的单细胞 RNA-seq 数据插补。
Sci Rep. 2018 Nov 5;8(1):16329. doi: 10.1038/s41598-018-34688-x.
5
Late Fusion Incomplete Multi-View Clustering.晚期融合不完全多视图聚类
IEEE Trans Pattern Anal Mach Intell. 2019 Oct;41(10):2410-2423. doi: 10.1109/TPAMI.2018.2879108. Epub 2018 Nov 1.
6
An integrative tissue-network approach to identify and test human disease genes.一种用于识别和测试人类疾病基因的综合组织网络方法。
Nat Biotechnol. 2018 Oct 22. doi: 10.1038/nbt.4246.
7
An Improved Method for Prediction of Cancer Prognosis by Network Learning.一种通过网络学习预测癌症预后的改进方法。
Genes (Basel). 2018 Oct 2;9(10):478. doi: 10.3390/genes9100478.
8
Submegabase copy number variations arise during cerebral cortical neurogenesis as revealed by single-cell whole-genome sequencing.单细胞全基因组测序揭示,亚兆碱基级别的拷贝数变异会在大脑皮质神经发生过程中产生。
Proc Natl Acad Sci U S A. 2018 Oct 16;115(42):10804-10809. doi: 10.1073/pnas.1812702115. Epub 2018 Sep 27.
9
Decoding the Genomics of Abdominal Aortic Aneurysm.解析腹主动脉瘤的基因组学
Cell. 2018 Sep 6;174(6):1361-1372.e10. doi: 10.1016/j.cell.2018.07.021.
10
Single-cell RNA sequencing technologies and bioinformatics pipelines.单细胞 RNA 测序技术和生物信息学分析流程。
Exp Mol Med. 2018 Aug 7;50(8):1-14. doi: 10.1038/s12276-018-0071-8.