Turrisi Rosanna, Squillario Margherita, Abate Giulia, Uberti Daniela, Barla Annalisa
IEEE J Biomed Health Inform. 2023 Apr 20;PP. doi: 10.1109/JBHI.2023.3268729.
This work represents the first attempt to provide an overview of how to face data integration as the result of a dialogue between neuroscientists and computer scientists. Indeed, data integration is fundamental for studying complex multifactorial diseases, such as the neurodegenerative diseases. This work aims at warning the readers of common pitfalls and critical issues in both medical and data science fields. In this context, we define a road map for data scientists when they first approach the issue of data integration in the biomedical domain, highlighting the challenges that inevitably emerge when dealing with heterogeneous, large-scale and noisy data and proposing possible solutions. Here, we discuss data collection and statistical analysis usually seen as parallel and independent processes, as cross-disciplinary activities. Finally, we provide an exemplary application of data integration to address Alzheimer's Disease (AD), which is the most common multifactorial form of dementia worldwide. We critically discuss the largest and most widely used datasets in AD, and demonstrate how the emergence of machine learning and deep learning methods has had a significant impact on disease's knowledge particularly in the perspective of an early AD diagnosis.
这项工作是神经科学家和计算机科学家对话的成果,首次尝试概述如何面对数据整合问题。事实上,数据整合对于研究复杂的多因素疾病(如神经退行性疾病)至关重要。这项工作旨在提醒读者医学和数据科学领域常见的陷阱和关键问题。在此背景下,我们为数据科学家首次涉足生物医学领域的数据整合问题定义了一个路线图,强调处理异构、大规模和噪声数据时不可避免出现的挑战,并提出可能的解决方案。在这里,我们将通常视为并行且独立过程的数据收集和统计分析作为跨学科活动进行讨论。最后,我们提供了一个数据整合的示例性应用,以解决阿尔茨海默病(AD),它是全球最常见的多因素痴呆形式。我们批判性地讨论了AD中最大且使用最广泛的数据集,并展示了机器学习和深度学习方法的出现如何对疾病知识产生了重大影响,特别是从早期AD诊断的角度来看。