Department of Research and Evaluation, Kaiser Permanente Southern California Medical Group , Pasadena, California, USA.
Division of Research, Kaiser Permanente , Oakland, California, USA.
Inform Health Soc Care. 2021 Mar 2;46(1):18-28. doi: 10.1080/17538157.2020.1828890. Epub 2020 Nov 17.
Accurate identification of transgender persons is a critical first step in conducting transgender health studies. To develop an automated algorithm for identifying transgender individuals from electronic medical records (EMR) using free-text clinical notes. The development and validation of the algorithm was based on data from an integrated healthcare system that served as a participating site in the multicenter Study of Transition Outcomes and Gender. The training and test datasets each contained a total of 300 individuals identified between 2006 and 2014. Both datasets underwent a full medical record review by experienced research abstractors. The validated algorithm was then implemented to identify transgender individuals in the EMR using all clinical notes of patients that received care between January 1, 2015 and June 30, 2018. Validation of the algorithm against the full chart review demonstrated a high degree of accuracy with 97% sensitivity, 95% specificity, 94% positive predictive value, and 97% negative predictive value. The algorithm classified 7,409 individuals (3.5%) as "Definitely transgender" and 679 individuals (0.3%) as "Probably transgender" out of 212,138 candidates with a total of 378,641 clinical notes. The computerized NLP algorithm can support essential efforts to improve the health of transgender people.
准确识别跨性别者是进行跨性别健康研究的关键第一步。本研究旨在开发一种基于电子病历(EMR)中自由文本临床记录自动识别跨性别个体的算法。该算法的开发和验证基于来自一个综合性医疗保健系统的数据,该系统是跨中心过渡结果和性别研究的参与站点之一。训练集和测试集分别包含了 2006 年至 2014 年间总共 300 名被识别的个体。两个数据集都经过了经验丰富的研究记录员进行的完整病历审查。然后,使用所有接受 2015 年 1 月 1 日至 2018 年 6 月 30 日期间治疗的患者的临床记录,将验证后的算法应用于 EMR 中识别跨性别个体。该算法对完整图表审查的验证显示出高度的准确性,敏感性为 97%,特异性为 95%,阳性预测值为 94%,阴性预测值为 97%。该算法在 212138 名候选者中,共 378641 条临床记录中,将 7409 名个体(3.5%)分类为“绝对跨性别”,将 679 名个体(0.3%)分类为“可能跨性别”。计算机化自然语言处理算法可以支持改善跨性别者健康的重要努力。