Suppr超能文献

利用临床记录和自然语言处理进行自动化 HIV 风险评估。

Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment.

机构信息

Department of Biomedical Informatics, Columbia University, New York, NY.

Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY.

出版信息

J Acquir Immune Defic Syndr. 2018 Feb 1;77(2):160-166. doi: 10.1097/QAI.0000000000001580.

Abstract

OBJECTIVE

Universal HIV screening programs are costly, labor intensive, and often fail to identify high-risk individuals. Automated risk assessment methods that leverage longitudinal electronic health records (EHRs) could catalyze targeted screening programs. Although social and behavioral determinants of health are typically captured in narrative documentation, previous analyses have considered only structured EHR fields. We examined whether natural language processing (NLP) would improve predictive models of HIV diagnosis.

METHODS

One hundred eighty-one HIV+ individuals received care at New York Presbyterian Hospital before a confirmatory HIV diagnosis and 543 HIV negative controls were selected using propensity score matching and included in the study cohort. EHR data including demographics, laboratory tests, diagnosis codes, and unstructured notes before HIV diagnosis were extracted for modeling. Three predictive algorithms were developed using machine-learning algorithms: (1) a baseline model with only structured EHR data, (2) baseline plus NLP topics, and (3) baseline plus NLP clinical keywords.

RESULTS

Predictive models demonstrated a range of performance with F measures of 0.59 for the baseline model, 0.63 for the baseline + NLP topic model, and 0.74 for the baseline + NLP keyword model. The baseline + NLP keyword model yielded the highest precision by including keywords including "msm," "unprotected," "hiv," and "methamphetamine," and structured EHR data indicative of additional HIV risk factors.

CONCLUSIONS

NLP improved the predictive performance of automated HIV risk assessment by extracting terms in clinical text indicative of high-risk behavior. Future studies should explore more advanced techniques for extracting social and behavioral determinants from clinical text.

摘要

目的

普及艾滋病毒筛查计划成本高昂,劳动强度大,且往往无法识别高危人群。利用纵向电子健康记录(EHR)的自动化风险评估方法可以促进有针对性的筛查计划。尽管健康的社会和行为决定因素通常在叙述性文件中记录,但之前的分析仅考虑了结构化 EHR 字段。我们研究了自然语言处理(NLP)是否会提高艾滋病毒诊断预测模型的性能。

方法

181 名艾滋病毒阳性个体在纽约长老会医院接受治疗,然后在确诊艾滋病毒之前接受了检查,并且使用倾向评分匹配选择了 543 名艾滋病毒阴性对照者,并将其纳入研究队列。提取 EHR 数据,包括诊断前的人口统计学数据、实验室检查、诊断代码和非结构化记录,用于建模。使用机器学习算法开发了三种预测算法:(1)仅使用结构化 EHR 数据的基线模型;(2)基线+NLP 主题模型;(3)基线+NLP 临床关键词模型。

结果

预测模型的性能范围广泛,基线模型的 F 度量为 0.59,基线+NLP 主题模型为 0.63,基线+NLP 关键词模型为 0.74。基线+NLP 关键词模型通过包含“男男性接触者”、“无保护措施”、“艾滋病毒”和“甲基苯丙胺”等关键词以及结构化 EHR 数据,提示了其他艾滋病毒风险因素,从而实现了最高的精度。

结论

NLP 通过提取临床文本中表示高危行为的术语,提高了自动化艾滋病毒风险评估的预测性能。未来的研究应该探索从临床文本中提取社会和行为决定因素的更先进技术。

相似文献

引用本文的文献

本文引用的文献

5
Evaluating topic model interpretability from a primary care physician perspective.从初级保健医生的角度评估主题模型的可解释性。
Comput Methods Programs Biomed. 2016 Feb;124:67-75. doi: 10.1016/j.cmpb.2015.10.014. Epub 2015 Oct 30.
7
Learning probabilistic phenotypes from heterogeneous EHR data.从异构电子健康记录数据中学习概率性表型。
J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验