Institute of Health Informatics, University College London, London, UK
Institute of Health Informatics, University College London, London, UK.
BMJ Health Care Inform. 2022 Apr;29(1). doi: 10.1136/bmjhci-2021-100457.
The Indian Liver Patient Dataset (ILPD) is used extensively to create algorithms that predict liver disease. Given the existing research describing demographic inequities in liver disease diagnosis and management, these algorithms require scrutiny for potential biases. We address this overlooked issue by investigating ILPD models for sex bias.
Following our literature review of ILPD papers, the models reported in existing studies are recreated and then interrogated for bias. We define four experiments, training on sex-unbalanced/balanced data, with and without feature selection. We build random forests (RFs), support vector machines (SVMs), Gaussian Naïve Bayes and logistic regression (LR) classifiers, running experiments 100 times, reporting average results with SD.
We reproduce published models achieving accuracies of >70% (LR 71.31% (2.37 SD) - SVM 79.40% (2.50 SD)) and demonstrate a previously unobserved performance disparity. Across all classifiers females suffer from a higher false negative rate (FNR). Presently, RF and LR classifiers are reported as the most effective models, yet in our experiments they demonstrate the greatest FNR disparity (RF; -21.02%; LR; -24.07%).
We demonstrate a sex disparity that exists in published ILPD classifiers. In practice, the higher FNR for females would manifest as increased rates of missed diagnosis for female patients and a consequent lack of appropriate care. Our study demonstrates that evaluating biases in the initial stages of machine learning can provide insights into inequalities in current clinical practice, reveal pathophysiological differences between the male and females, and can mitigate the digitisation of inequalities into algorithmic systems.
Our findings are important to medical data scientists, clinicians and policy-makers involved in the implementation medical artificial intelligence systems. An awareness of the potential biases of these systems is essential in preventing the digital exacerbation of healthcare inequalities.
印度肝病患者数据集(ILPD)被广泛用于创建预测肝病的算法。鉴于现有的研究描述了肝病诊断和管理方面的人口统计学差异,这些算法需要仔细检查是否存在潜在偏差。我们通过研究 ILPD 模型中的性别偏差来解决这个被忽视的问题。
在对 ILPD 论文进行文献回顾后,我们重新创建了现有研究中报告的模型,并对其进行了偏差检测。我们定义了四个实验,在性别不平衡/平衡数据上进行训练,并带有/不带有特征选择。我们构建了随机森林(RF)、支持向量机(SVM)、高斯朴素贝叶斯和逻辑回归(LR)分类器,进行了 100 次实验,报告平均结果及其标准差。
我们复制了发表的模型,其准确率超过 70%(LR 为 71.31%(2.37 标准差)-SVM 为 79.40%(2.50 标准差)),并展示了一个以前未观察到的性能差异。在所有分类器中,女性的假阴性率(FNR)更高。目前,RF 和 LR 分类器被报告为最有效的模型,但在我们的实验中,它们表现出最大的 FNR 差异(RF:-21.02%;LR:-24.07%)。
我们展示了发表的 ILPD 分类器中存在的性别差异。在实践中,女性的 FNR 较高将表现为女性患者的漏诊率增加,以及相应的缺乏适当护理。我们的研究表明,在机器学习的初始阶段评估偏差可以深入了解当前临床实践中的不平等现象,揭示男性和女性之间的生理差异,并可以减轻不平等现象在算法系统中的数字化。
我们的研究结果对参与实施医疗人工智能系统的医学数据科学家、临床医生和政策制定者非常重要。了解这些系统的潜在偏差对于防止医疗保健不平等现象的数字化加剧至关重要。