利用少量数据使机器学习诊断模型适用于新人群：临床神经科学的结果

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience.

作者信息

Wang Rongguang, Erus Guray, Chaudhari Pratik, Davatzikos Christos

机构信息

Department of Electrical and Systems Engineering, University of Pennsylvania.

Center for AI and Data Science for Integrated Diagnostics, University of Pennsylvania.

出版信息

ArXiv. 2024 Sep 13:arXiv:2308.03175v2.

PMID:39314511

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11419182/

Abstract

Machine learning (ML) is revolutionizing many areas of engineering and science, including healthcare. However, it is also facing a reproducibility crisis, especially in healthcare. ML models that are carefully constructed from and evaluated on data from one part of the population may not generalize well on data from a different population group, or acquisition instrument settings and acquisition protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests. In summary, we present a relatively simple methodology, along with ample experimental evidence, supporting the good generalization of ML models to new datasets and patient cohorts.

摘要

机器学习（ML）正在革新包括医疗保健在内的许多工程和科学领域。然而，它也面临着可重复性危机，尤其是在医疗保健领域。基于某一人群的数据精心构建并评估的ML模型，在来自不同人群组的数据、采集仪器设置和采集协议上可能无法很好地泛化。我们在阿尔茨海默病（AD）、精神分裂症（SZ）和脑老化的神经成像背景下解决这个问题。我们开发了一种加权经验风险最小化方法，该方法最优地组合来自源组的数据，例如，根据性别、年龄组、种族和临床队列等属性对受试者进行分层，以便使用来自目标组的一小部分（10%）数据对目标组（例如，其他性别、年龄组等）进行预测。我们将此方法应用于来自20项神经成像研究的15363名个体的多源数据，以构建用于AD和SZ诊断以及脑年龄估计的ML模型。我们发现，这种方法比现有的域适应技术具有显著更高的准确性：在所有目标组上，它在AD分类中获得的曲线下面积大于0.95，在SZ分类中获得的曲线下面积大于0.7，在脑年龄预测中平均绝对误差小于5岁，对扫描仪、协议以及人口统计学或临床特征的变化具有鲁棒性。在某些情况下，它甚至比在目标组的所有数据上进行训练更好，因为它利用了更大训练集的多样性和规模。我们还展示了我们的模型在预后任务中的效用，例如预测轻度认知障碍个体的疾病进展。至关重要的是，我们的脑年龄预测模型带来了关于与神经生理学测试相关性的新临床见解。总之，我们提出了一种相对简单的方法，并提供了充分的实验证据，支持ML模型对新数据集和患者队列的良好泛化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11419182/c528978a7a50/nihpp-2308.03175v2-f0001.jpg

相似文献

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience.

ArXiv. 2024 Sep 13:arXiv:2308.03175v2.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Embracing the disharmony in medical imaging: A Simple and effective framework for domain adaptation.

Med Image Anal. 2022 Feb;76:102309. doi: 10.1016/j.media.2021.102309. Epub 2021 Nov 26.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

Generalization of diffusion magnetic resonance imaging-based brain age prediction model through transfer learning.

Neuroimage. 2020 Aug 15;217:116831. doi: 10.1016/j.neuroimage.2020.116831. Epub 2020 May 11.

A confounder controlled machine learning approach: Group analysis and classification of schizophrenia and Alzheimer's disease using resting-state functional network connectivity.

PLoS One. 2024 May 20;19(5):e0293053. doi: 10.1371/journal.pone.0293053. eCollection 2024.

Modelling prognostic trajectories of cognitive decline due to Alzheimer's disease.

Neuroimage Clin. 2020;26:102199. doi: 10.1016/j.nicl.2020.102199. Epub 2020 Jan 26.

Multi-auxiliary domain transfer learning for diagnosis of MCI conversion.

Neurol Sci. 2022 Mar;43(3):1721-1739. doi: 10.1007/s10072-021-05568-6. Epub 2021 Sep 12.

A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer's disease.

Neuroimage. 2019 Apr 1;189:276-287. doi: 10.1016/j.neuroimage.2019.01.031. Epub 2019 Jan 14.

Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects.

Neuroimage. 2015 Jan 1;104:398-412. doi: 10.1016/j.neuroimage.2014.10.002. Epub 2014 Oct 12.

本文引用的文献

Exploring the potential of representation and transfer learning for anatomical neuroimaging: Application to psychiatry.

Neuroimage. 2024 Aug 1;296:120665. doi: 10.1016/j.neuroimage.2024.120665. Epub 2024 Jun 6.

Illusory generalizability of clinical prediction models.

Science. 2024 Jan 12;383(6679):164-167. doi: 10.1126/science.adg8538. Epub 2024 Jan 11.

Automatic correction of performance drift under acquisition shift in medical image classification.

Nat Commun. 2023 Oct 19;14(1):6608. doi: 10.1038/s41467-023-42396-y.

A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics.

Nat Biomed Eng. 2023 Jun;7(6):743-755. doi: 10.1038/s41551-023-01045-x. Epub 2023 Jun 12.

The Current and Future State of AI Interpretation of Medical Images.

N Engl J Med. 2023 May 25;388(21):1981-1990. doi: 10.1056/NEJMra2301725.

How Robust is Your Fairness? Evaluating and Sustaining Fairness under Unseen Distribution Shifts.

Transact Mach Learn Res. 2023;2023. Epub 2023 Mar 13.

Harmonization with Flow-Based Causal Inference.

Med Image Comput Comput Assist Interv. 2021 Sep-Oct;12903:181-190. doi: 10.1007/978-3-030-87199-4_17. Epub 2021 Sep 21.

The end game: respecting major sources of population diversity.

Nat Methods. 2023 Aug;20(8):1122-1128. doi: 10.1038/s41592-023-01812-3.

Multiscale functional connectivity patterns of the aging brain learned from harmonized rsfMRI data of the multi-cohort iSTAGING study.

Neuroimage. 2023 Apr 1;269:119911. doi: 10.1016/j.neuroimage.2023.119911. Epub 2023 Jan 30.

Bias in machine learning models can be significantly mitigated by careful training: Evidence from neuroimaging studies.

Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2211613120. doi: 10.1073/pnas.2211613120. Epub 2023 Jan 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用少量数据使机器学习诊断模型适用于新人群：临床神经科学的结果

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献