Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, United States.
Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, United States.
J Am Med Inform Assoc. 2024 May 20;31(6):1348-1355. doi: 10.1093/jamia/ocae047.
Large-language models (LLMs) can potentially revolutionize health care delivery and research, but risk propagating existing biases or introducing new ones. In epilepsy, social determinants of health are associated with disparities in care access, but their impact on seizure outcomes among those with access remains unclear. Here we (1) evaluated our validated, epilepsy-specific LLM for intrinsic bias, and (2) used LLM-extracted seizure outcomes to determine if different demographic groups have different seizure outcomes.
We tested our LLM for differences and equivalences in prediction accuracy and confidence across demographic groups defined by race, ethnicity, sex, income, and health insurance, using manually annotated notes. Next, we used LLM-classified seizure freedom at each office visit to test for demographic outcome disparities, using univariable and multivariable analyses.
We analyzed 84 675 clinic visits from 25 612 unique patients seen at our epilepsy center. We found little evidence of bias in the prediction accuracy or confidence of outcome classifications across demographic groups. Multivariable analysis indicated worse seizure outcomes for female patients (OR 1.33, P ≤ .001), those with public insurance (OR 1.53, P ≤ .001), and those from lower-income zip codes (OR ≥1.22, P ≤ .007). Black patients had worse outcomes than White patients in univariable but not multivariable analysis (OR 1.03, P = .66).
We found little evidence that our LLM was intrinsically biased against any demographic group. Seizure freedom extracted by LLM revealed disparities in seizure outcomes across several demographic groups. These findings quantify the critical need to reduce disparities in the care of people with epilepsy.
大型语言模型(LLM)有可能彻底改变医疗保健的提供和研究方式,但存在传播现有偏见或引入新偏见的风险。在癫痫领域,健康的社会决定因素与获得医疗服务的差距有关,但获得医疗服务的患者中,这些因素对发作结局的影响尚不清楚。在此,我们 (1) 评估了经过验证的、专门针对癫痫的 LLM 是否存在内在偏见,以及 (2) 使用 LLM 提取的发作结局来确定不同人群的发作结局是否存在差异。
我们使用人工标注的注释,测试了我们的 LLM 在种族、族裔、性别、收入和医疗保险定义的不同人群之间的预测准确性和置信度是否存在差异和等同。接下来,我们使用 LLM 分类的每次就诊时的无发作状态来测试人口统计学结局差异,使用单变量和多变量分析。
我们分析了我们癫痫中心就诊的 25612 名患者的 84675 次就诊。我们发现,在不同人群的结局分类预测准确性和置信度方面,几乎没有证据表明存在偏见。多变量分析表明,女性患者(OR 1.33,P ≤.001)、拥有公共保险的患者(OR 1.53,P ≤.001)和来自低收入邮政编码的患者(OR ≥1.22,P ≤ .007)的发作结局更差。黑人患者的结局比白人患者差,但仅在单变量分析中,多变量分析无差异(OR 1.03,P = .66)。
我们发现,我们的 LLM 几乎没有内在偏见任何人群的证据。LLM 提取的无发作状态揭示了几个人口统计学群体的发作结局存在差异。这些发现量化了减少癫痫患者护理差距的迫切需要。