Department of Public Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
Computer and Information Science Department, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
PLoS One. 2020 Dec 17;15(12):e0240376. doi: 10.1371/journal.pone.0240376. eCollection 2020.
The rapid integration of Artificial Intelligence (AI) into the healthcare field has occurred with little communication between computer scientists and doctors. The impact of AI on health outcomes and inequalities calls for health professionals and data scientists to make a collaborative effort to ensure historic health disparities are not encoded into the future. We present a study that evaluates bias in existing Natural Language Processing (NLP) models used in psychiatry and discuss how these biases may widen health inequalities. Our approach systematically evaluates each stage of model development to explore how biases arise from a clinical, data science and linguistic perspective.
DESIGN/METHODS: A literature review of the uses of NLP in mental health was carried out across multiple disciplinary databases with defined Mesh terms and keywords. Our primary analysis evaluated biases within 'GloVe' and 'Word2Vec' word embeddings. Euclidean distances were measured to assess relationships between psychiatric terms and demographic labels, and vector similarity functions were used to solve analogy questions relating to mental health.
Our primary analysis of mental health terminology in GloVe and Word2Vec embeddings demonstrated significant biases with respect to religion, race, gender, nationality, sexuality and age. Our literature review returned 52 papers, of which none addressed all the areas of possible bias that we identify in model development. In addition, only one article existed on more than one research database, demonstrating the isolation of research within disciplinary silos and inhibiting cross-disciplinary collaboration or communication.
Our findings are relevant to professionals who wish to minimize the health inequalities that may arise as a result of AI and data-driven algorithms. We offer primary research identifying biases within these technologies and provide recommendations for avoiding these harms in the future.
人工智能(AI)在医疗领域的快速融合几乎没有计算机科学家和医生之间的沟通。AI 对健康结果和不平等的影响要求卫生专业人员和数据科学家共同努力,确保历史上的健康差距不会被纳入未来。我们提出了一项研究,评估了现有的用于精神病学的自然语言处理(NLP)模型中的偏见,并讨论了这些偏见如何扩大健康不平等。我们的方法系统地评估了模型开发的每个阶段,以从临床、数据科学和语言学的角度探讨偏见是如何产生的。
设计/方法:通过定义 Mesh 术语和关键词,在多个学科数据库中对 NLP 在精神健康中的应用进行了文献综述。我们的主要分析评估了“GloVe”和“Word2Vec”词嵌入中的偏见。测量欧几里得距离以评估精神卫生术语和人口统计标签之间的关系,并使用向量相似函数解决与精神卫生相关的类比问题。
我们对 GloVe 和 Word2Vec 嵌入中的精神健康术语的主要分析表明,在宗教、种族、性别、国籍、性取向和年龄方面存在显著偏见。我们的文献综述返回了 52 篇论文,其中没有一篇论文涉及我们在模型开发中确定的所有可能偏见领域。此外,只有一篇文章存在于多个研究数据库中,这表明研究孤立于学科壁龛中,阻碍了跨学科合作或交流。
我们的研究结果与希望最大限度地减少因 AI 和数据驱动算法而可能产生的健康不平等的专业人员有关。我们提供了识别这些技术中偏见的初步研究,并提出了避免未来这些危害的建议。