Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA; email:
Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Annu Rev Biomed Data Sci. 2024 Aug;7(1):225-250. doi: 10.1146/annurev-biodatasci-102523-103801. Epub 2024 Jul 24.
The integration of multiomics data with detailed phenotypic insights from electronic health records marks a paradigm shift in biomedical research, offering unparalleled holistic views into health and disease pathways. This review delineates the current landscape of multimodal omics data integration, emphasizing its transformative potential in generating a comprehensive understanding of complex biological systems. We explore robust methodologies for data integration, ranging from concatenation-based to transformation-based and network-based strategies, designed to harness the intricate nuances of diverse data types. Our discussion extends from incorporating large-scale population biobanks to dissecting high-dimensional omics layers at the single-cell level. The review underscores the emerging role of large language models in artificial intelligence, anticipating their influence as a near-future pivot in data integration approaches. Highlighting both achievements and hurdles, we advocate for a concerted effort toward sophisticated integration models, fortifying the foundation for groundbreaking discoveries in precision medicine.
多组学数据与电子健康记录中详细的表型见解的整合标志着生物医学研究的范式转变,为健康和疾病途径提供了无与伦比的整体视角。这篇综述描绘了多模态组学数据整合的当前格局,强调了其在全面理解复杂生物系统方面的变革潜力。我们探索了强大的数据整合方法,包括基于拼接的、基于转换的和基于网络的策略,旨在利用不同数据类型的复杂细微差别。我们的讨论从利用大型人群生物库扩展到在单细胞水平上剖析高维组学层。该综述强调了大型语言模型在人工智能中的新兴作用,预计它们将成为未来数据整合方法的一个重要转折点。我们既强调了成就,也强调了障碍,倡导朝着复杂的整合模型努力,为精准医学的突破性发现奠定基础。