College of Medicine, University of Florida, Gainesville, FL, USA.
Regenstrief Institute, Inc., Indianapolis, IN, USA; Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, IN, USA.
Int J Med Inform. 2023 Sep;177:105115. doi: 10.1016/j.ijmedinf.2023.105115. Epub 2023 Jun 5.
The objective of this study is to validate and report on portability and generalizability of a Natural Language Processing (NLP) method to extract individual social factors from clinical notes, which was originally developed at a different institution.
A rule-based deterministic state machine NLP model was developed to extract financial insecurity and housing instability using notes from one institution and was applied on all notes written during 6 months at another institution. 10% of positively-classified notes by NLP and the same number of negatively-classified notes were manually annotated. The NLP model was adjusted to accommodate notes at the new site. Accuracy, positive predictive value, sensitivity, and specificity were calculated.
More than 6 million notes were processed at the receiving site by the NLP model, which resulted in about 13,000 and 19,000 classified as positive for financial insecurity and housing instability, respectively. The NLP model showed excellent performance on the validation dataset with all measures over 0.87 for both social factors.
Our study illustrated the need to accommodate institution-specific note-writing templates as well as clinical terminology of emergent diseases when applying NLP model for social factors. A state machine is relatively simple to port effectively across institutions. Our study. showed superior performance to similar generalizability studies for extracting social factors.
Rule-based NLP model to extract social factors from clinical notes showed strong portability and generalizability across organizationally and geographically distinct institutions. With only relatively simple modifications, we obtained promising performance from an NLP-based model.
本研究旨在验证和报告一种从临床记录中提取个体社会因素的自然语言处理(NLP)方法的可移植性和通用性,该方法最初是在不同的机构开发的。
开发了一种基于规则的确定性状态机 NLP 模型,用于使用一个机构的记录提取财务不安全和住房不稳定,并将其应用于另一个机构 6 个月内的所有记录。通过 NLP 分类为阳性的记录中随机抽取 10%,以及同样数量的分类为阴性的记录进行手动注释。对 NLP 模型进行了调整,以适应新站点的记录。计算了准确性、阳性预测值、敏感性和特异性。
在接收站点,NLP 模型处理了超过 600 万条记录,其中约有 13000 条和 19000 条记录分别被分类为财务不安全和住房不稳定阳性。该 NLP 模型在验证数据集上表现出色,两个社会因素的所有指标均超过 0.87。
我们的研究表明,在将 NLP 模型应用于社会因素时,需要适应特定机构的记录编写模板以及新兴疾病的临床术语。状态机相对容易在机构之间有效地移植。我们的研究表明,在提取社会因素方面,该方法的性能优于类似的可推广性研究。
基于规则的 NLP 模型从临床记录中提取社会因素具有较强的可移植性和通用性,可以跨越组织和地理位置不同的机构。通过相对简单的修改,我们从基于 NLP 的模型中获得了有前景的性能。