Hatef Elham, Rouhizadeh Masoud, Nau Claudia, Xie Fagen, Rouillard Christopher, Abu-Nasser Mahmoud, Padilla Ariadna, Lyons Lindsay Joe, Kharrazi Hadi, Weiner Jonathan P, Roblin Douglas
Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.
Institute for Clinical and Translational Research, Johns Hopkins Medical Institute, Baltimore, Maryland, USA.
JAMIA Open. 2022 Feb 16;5(1):ooac006. doi: 10.1093/jamiaopen/ooac006. eCollection 2022 Apr.
To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems.
We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity.
The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0).
The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs.
The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.
评估一种自然语言处理(NLP)算法能否经过调整,从3个医疗系统的电子健康记录(EHR)中以可接受的效度提取居住不稳定的标志物(即无家可归和住房不安全)。
我们纳入了2016年至2020年期间在3个医疗系统之一接受治疗且在此期间EHR中至少有1条自由文本记录的18岁及以上患者。我们独立开展研究;NLP算法逻辑和效度评估方法在各研究点相同。评估效度的金标准制定方法在各研究点有所不同。使用spaCy 2.3 Python工具包的EntityRuler模块,我们创建了一个基于规则的NLP系统,该系统由专家制定的模式组成,这些模式表明牵头研究点存在居住不稳定情况,并利用从其他2个研究点应用中获得的见解丰富了NLP系统。我们在每个研究点对算法进行调整,然后使用拆分样本方法对算法进行验证。我们通过阳性预测值(精确率)、灵敏度(召回率)和特异性指标评估算法的性能。
NLP算法在3个研究点的精确率中等(分别为0.45、0.73和1.0)。NLP算法的灵敏度和特异性在3个研究点有所不同(灵敏度:0.68、0.85和0.96;特异性:0.69、0.89和1.0)。
该NLP算法在3个不同医疗系统中识别居住不稳定情况的表现表明,该算法总体上有效,且适用于具有类似EHR的其他医疗系统。
本项目开发的NLP方法具有适应性,可进行修改,以从不同医疗系统的EHR中提取除居住不稳定之外的其他社会需求类型。