Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104.
Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104.
Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2211613120. doi: 10.1073/pnas.2211613120. Epub 2023 Jan 30.
Despite the great promise that machine learning has offered in many fields of medicine, it has also raised concerns about potential biases and poor generalization across genders, age distributions, races and ethnicities, hospitals, and data acquisition equipment and protocols. In the current study, and in the context of three brain diseases, we provide evidence which suggests that when properly trained, machine learning models can generalize well across diverse conditions and do not necessarily suffer from bias. Specifically, by using multistudy magnetic resonance imaging consortia for diagnosing Alzheimer's disease, schizophrenia, and autism spectrum disorder, we find that well-trained models have a high area-under-the-curve (AUC) on subjects across different subgroups pertaining to attributes such as gender, age, racial groups and different clinical studies and are unbiased under multiple fairness metrics such as demographic parity difference, equalized odds difference, equal opportunity difference, etc. We find that models that incorporate multisource data from demographic, clinical, genetic factors, and cognitive scores are also unbiased. These models have a better predictive AUC across subgroups than those trained only with imaging features, but there are also situations when these additional features do not help.
尽管机器学习在医学的许多领域都展现出了巨大的潜力,但它也引发了人们对潜在偏见和在性别、年龄分布、种族和民族、医院以及数据采集设备和协议等方面的泛化能力不足的担忧。在本研究中,我们以三种脑部疾病为例,提供了证据表明,经过适当训练,机器学习模型可以很好地泛化到不同的情况下,并且不一定存在偏见。具体来说,通过使用多研究磁共振成像联盟来诊断阿尔茨海默病、精神分裂症和自闭症谱系障碍,我们发现,经过良好训练的模型在涉及性别、年龄、种族群体和不同临床研究等属性的不同亚组的受试者中具有较高的曲线下面积 (AUC),并且在多个公平性指标(如人口统计学均等差异、均等机会差异等)下没有偏见。我们还发现,纳入来自人口统计学、临床、遗传因素和认知评分的多源数据的模型也是无偏的。这些模型在亚组中的预测 AUC 优于仅使用成像特征训练的模型,但在某些情况下,这些附加特征并没有帮助。