Liu Chenxi, Zheng Jianing, Liu Yushu, Wang Xi, Zhang Yuting, Fu Qiang, Yu Wenwen, Yu Ting, Jiang Wang, Wang Dan, Liu Chaojie
School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, 13 Hangkong Road, Wuhan, Hubei, 430030, China.
Melbourne Institute of Applied Economic and Social Research, Faculty of Business and Economics, The University of Melbourne, 111 Barry St, Carlton, Victoria, 3010, Australia.
Int J Equity Health. 2025 Jul 15;24(1):206. doi: 10.1186/s12939-025-02581-5.
Large language models (LLMs) may perpetuate or amplify social biases toward patients. We systematically assessed potential biases of three popular Chinese LLMs in clinical application scenarios.
We tested whether Qwen, Erine, and Baichuan encode social biases for patients of different sex, ethnicity, educational attainment, income level, and health insurance status. First, we prompted LLMs to generate clinical cases for medical education (n = 8,289) and compared the distribution of patient characteristics in LLM-generated cases with national distributions in China. Second, New England Journal of Medicine Healer clinical vignettes were used to prompt LLMs to generate differential diagnoses and treatment plans (n = 45,600), with variations analyzed based on sociodemographic characteristics. Third, we prompted LLMs to assess patient needs (n = 51,039) based on clinical cases, revealing any implicit biases toward patients with different characteristics.
The three LLMs showed social biases toward patients with different characteristics to varying degrees in medical education, diagnostic and treatment recommendation, and patient needs assessment. These biases were more frequent in relation to sex, ethnicity, income level, and health insurance status, compared to educational attainment. Overall, the three LLMs failed to appropriately model the sociodemographic diversity of medical conditions, consistently over-representing male, high-education and high-income populations. They also showed a higher referral rate, indicating potential refusal to treat patients, for minority ethnic groups and those without insurance or living with low incomes. The three LLMs were more likely to recommend pain medications for males, and considered patients with higher educational attainment, Han ethnicity, higher income, and those with health insurance as having healthier relationships with others.
Our findings broaden the scopes of potential biases inherited in LLMs and highlight the urgent need for systematic and continuous assessments of social biases in LLMs in real-world clinical applications.
大语言模型(LLMs)可能会延续或加剧对患者的社会偏见。我们系统地评估了三种流行的中文大语言模型在临床应用场景中的潜在偏见。
我们测试了文生、豆包和百川是否对不同性别、种族、教育程度、收入水平和健康保险状况的患者存在社会偏见。首先,我们促使大语言模型生成用于医学教育的临床病例(n = 8289),并将大语言模型生成病例中的患者特征分布与中国的全国分布进行比较。其次,使用《新英格兰医学杂志》治疗师临床病例 vignettes 促使大语言模型生成鉴别诊断和治疗方案(n = 45600),并根据社会人口统计学特征分析差异。第三,我们促使大语言模型根据临床病例评估患者需求(n = 51039),以揭示对不同特征患者的任何隐性偏见。
这三种大语言模型在医学教育、诊断和治疗建议以及患者需求评估中,对不同特征的患者都表现出不同程度的社会偏见。与教育程度相比,这些偏见在性别、种族、收入水平和健康保险状况方面更为常见。总体而言,这三种大语言模型未能恰当地模拟医疗状况的社会人口统计学多样性,始终过度代表男性、高学历和高收入人群。它们还显示出对少数族裔以及没有保险或低收入人群的转诊率较高,表明可能拒绝治疗这些患者。这三种大语言模型更有可能为男性推荐止痛药,并认为教育程度较高、汉族、收入较高以及有健康保险的患者与他人的关系更健康。
我们的研究结果拓宽了大语言模型中潜在偏见的范围,并强调了在实际临床应用中对大语言模型的社会偏见进行系统和持续评估的迫切需求。