Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA; email:
Laboratory of AI and Biomedical Science (LABS), Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, Los Angeles, California, USA.
Annu Rev Biomed Data Sci. 2024 Aug;7(1):391-418. doi: 10.1146/annurev-biodatasci-102423-121021. Epub 2024 Jul 24.
Alzheimer's disease (AD) is a critical national concern, affecting 5.8 million people and costing more than $250 billion annually. However, there is no available cure. Thus, effective strategies are in urgent need to discover AD biomarkers for disease early detection and drug development. In this review, we study AD from a biomedical data scientist perspective to discuss the four fundamental components in AD research: genetics (G), molecular multiomics (M), multimodal imaging biomarkers (B), and clinical outcomes (O) (collectively referred to as the GMBO framework). We provide a comprehensive review of common statistical and informatics methodologies for each component within the GMBO framework, accompanied by the major findings from landmark AD studies. Our review highlights the potential of multimodal biobank data in addressing key challenges in AD, such as early diagnosis, disease heterogeneity, and therapeutic development. We identify major hurdles in AD research, including data scarcity and complexity, and advocate for enhanced collaboration, data harmonization, and advanced modeling techniques. This review aims to be an essential guide for understanding current biomedical data science strategies in AD research, emphasizing the need for integrated, multidisciplinary approaches to advance our understanding and management of AD.
阿尔茨海默病(AD)是一个严重的国家关切问题,影响了 580 万人,每年耗费超过 2500 亿美元。然而,目前尚无有效的治疗方法。因此,迫切需要采取有效的策略来发现 AD 的生物标志物,以便进行早期检测和药物研发。在本综述中,我们从生物医学数据科学家的角度研究 AD,讨论 AD 研究的四个基本组成部分:遗传学(G)、分子多组学(M)、多模态影像生物标志物(B)和临床结局(O)(统称为 GMBO 框架)。我们全面回顾了 GMBO 框架内每个组成部分的常见统计和信息学方法,并介绍了 AD 研究中的重要发现。我们的综述强调了多模态生物库数据在解决 AD 中的关键挑战方面的潜力,例如早期诊断、疾病异质性和治疗开发。我们确定了 AD 研究中的主要障碍,包括数据稀缺和复杂性,并倡导加强合作、数据协调和先进的建模技术。本综述旨在成为理解 AD 研究中当前生物医学数据科学策略的重要指南,强调需要采用综合的、多学科的方法来推进我们对 AD 的理解和管理。