Suppr超能文献

利用少量数据使机器学习诊断模型适用于新人群:临床神经科学的结果

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience.

作者信息

Wang Rongguang, Erus Guray, Chaudhari Pratik, Davatzikos Christos

机构信息

Department of Electrical and Systems Engineering, University of Pennsylvania.

Center for AI and Data Science for Integrated Diagnostics, University of Pennsylvania.

出版信息

ArXiv. 2024 Sep 13:arXiv:2308.03175v2.

Abstract

Machine learning (ML) is revolutionizing many areas of engineering and science, including healthcare. However, it is also facing a reproducibility crisis, especially in healthcare. ML models that are carefully constructed from and evaluated on data from one part of the population may not generalize well on data from a different population group, or acquisition instrument settings and acquisition protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests. In summary, we present a relatively simple methodology, along with ample experimental evidence, supporting the good generalization of ML models to new datasets and patient cohorts.

摘要

机器学习(ML)正在革新包括医疗保健在内的许多工程和科学领域。然而,它也面临着可重复性危机,尤其是在医疗保健领域。基于某一人群的数据精心构建并评估的ML模型,在来自不同人群组的数据、采集仪器设置和采集协议上可能无法很好地泛化。我们在阿尔茨海默病(AD)、精神分裂症(SZ)和脑老化的神经成像背景下解决这个问题。我们开发了一种加权经验风险最小化方法,该方法最优地组合来自源组的数据,例如,根据性别、年龄组、种族和临床队列等属性对受试者进行分层,以便使用来自目标组的一小部分(10%)数据对目标组(例如,其他性别、年龄组等)进行预测。我们将此方法应用于来自20项神经成像研究的15363名个体的多源数据,以构建用于AD和SZ诊断以及脑年龄估计的ML模型。我们发现,这种方法比现有的域适应技术具有显著更高的准确性:在所有目标组上,它在AD分类中获得的曲线下面积大于0.95,在SZ分类中获得的曲线下面积大于0.7,在脑年龄预测中平均绝对误差小于5岁,对扫描仪、协议以及人口统计学或临床特征的变化具有鲁棒性。在某些情况下,它甚至比在目标组的所有数据上进行训练更好,因为它利用了更大训练集的多样性和规模。我们还展示了我们的模型在预后任务中的效用,例如预测轻度认知障碍个体的疾病进展。至关重要的是,我们的脑年龄预测模型带来了关于与神经生理学测试相关性的新临床见解。总之,我们提出了一种相对简单的方法,并提供了充分的实验证据,支持ML模型对新数据集和患者队列的良好泛化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11419182/c528978a7a50/nihpp-2308.03175v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验