Suppr超能文献

深度学习方法在电子健康记录中识别跨性别和性别多样化患者。

A deep learning approach for transgender and gender diverse patient identification in electronic health records.

机构信息

Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Department of Epidemiology, Harvard T.H Chan School of Public Health, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.

出版信息

J Biomed Inform. 2023 Nov;147:104507. doi: 10.1016/j.jbi.2023.104507. Epub 2023 Sep 29.

Abstract

BACKGROUND

Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it remains a challenging task due to incomplete gender information in structured EHR fields.

OBJECTIVE

Using TGD identification as a case study, this research uses NLP and deep learning to build an accurate patient gender identity predictive model, aiming to tackle the challenges of identifying relevant patient-level information from EHR data and reducing annotation work.

METHODS

This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/1/2022. To identify relevant information from massive clinical notes, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms.

RESULTS

The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms.

CONCLUSION

This is the first study to show that deep learning-integrated NLP algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms.

摘要

背景

尽管准确识别电子健康记录(EHR)中的性别认同对于提供公平的医疗保健至关重要,特别是对于跨性别和性别多样化(TGD)人群,但由于结构化 EHR 字段中的性别信息不完整,这仍然是一项具有挑战性的任务。

目的

本研究以 TGD 识别为例,使用自然语言处理(NLP)和深度学习构建准确的患者性别身份预测模型,旨在解决从 EHR 数据中识别相关患者信息和减少注释工作的挑战。

方法

本研究纳入了马萨诸塞州波士顿一家大型医疗保健系统的成年患者,时间范围为 2017 年 4 月 1 日至 2022 年 4 月 1 日。为了从大量临床笔记中识别相关信息,我们通过专家编纂、文献回顾和通过微调的 BioWordVec 模型扩展,编制了一份性别相关关键词列表。该关键词列表用于预筛选潜在的 TGD 个体,并创建两个用于模型训练、测试和验证的数据集。数据集 I 是一个平衡数据集,包含经临床医生确认的 TGD 患者和无关键词的病例。数据集 II 包含有关键词的病例。比较了深度学习模型与传统机器学习和基于规则的算法的性能。

结果

最终的关键词列表由 109 个关键词组成,其中 58 个(53.2%)通过 BioWordVec 模型扩展。数据集 I 包含 3150 名患者(50%为 TGD),而数据集 II 包含 200 名患者(90%为 TGD)。在数据集 I 上,深度学习模型的 F1 得分为 0.917,灵敏度为 0.854,精度为 0.980;在数据集 II 上,F1 得分为 0.969,灵敏度为 0.967,精度为 0.972。深度学习模型明显优于基于规则的算法。

结论

这是第一项表明深度学习集成的 NLP 算法可以使用 EHR 数据准确识别性别认同的研究。未来的工作应利用和评估更多多样化的数据源,以生成更具普遍性的算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c51/10687838/a0bf6699d961/nihms-1944426-f0001.jpg

相似文献

引用本文的文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验