Suppr超能文献

利用自然语言处理和机器学习在电子健康记录中识别注射毒品者。

Natural Language Processing and Machine Learning to Identify People Who Inject Drugs in Electronic Health Records.

作者信息

Goodman-Meza David, Tang Amber, Aryanfar Babak, Vazquez Sergio, Gordon Adam J, Goto Michihiko, Goetz Matthew Bidwell, Shoptaw Steven, Bui Alex A T

机构信息

Division of Infectious Diseases, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA.

Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, California, USA.

出版信息

Open Forum Infect Dis. 2022 Sep 12;9(9):ofac471. doi: 10.1093/ofid/ofac471. eCollection 2022 Sep.

Abstract

BACKGROUND

Improving the identification of people who inject drugs (PWID) in electronic medical records can improve clinical decision making, risk assessment and mitigation, and health service research. Identification of PWID currently consists of heterogeneous, nonspecific () codes as proxies. Natural language processing (NLP) and machine learning (ML) methods may have better diagnostic metrics than nonspecific codes for identifying PWID.

METHODS

We manually reviewed 1000 records of patients diagnosed with bacteremia admitted to Veterans Health Administration hospitals from 2003 through 2014. The manual review was the reference standard. We developed and trained NLP/ML algorithms with and without regular expression filters for negation (NegEx) and compared these with 11 proxy combinations of codes to identify PWID. Data were split 70% for training and 30% for testing. We calculated diagnostic metrics and estimated 95% confidence intervals (CIs) by bootstrapping the hold-out test set. Best models were determined by best F-score, a summary of sensitivity and positive predictive value.

RESULTS

Random forest with and without NegEx were the best-performing NLP/ML algorithms in the training set. Random forest with NegEx outperformed all -based algorithms. F-score for the best NLP/ML algorithm was 0.905 (95% CI, .786-.967) and 0.592 (95% CI, .550-.632) for the best -based algorithm. The NLP/ML algorithm had a sensitivity of 92.6% and specificity of 95.4%.

CONCLUSIONS

NLP/ML outperformed -based coding algorithms at identifying PWID in electronic health records. NLP/ML models should be considered in identifying cohorts of PWID to improve clinical decision making, health services research, and administrative surveillance.

摘要

背景

改进电子病历中注射吸毒者(PWID)的识别有助于改善临床决策、风险评估与缓解以及卫生服务研究。目前对PWID的识别由异质性、非特异性的()代码作为替代指标。自然语言处理(NLP)和机器学习(ML)方法在识别PWID方面可能比非特异性代码具有更好的诊断指标。

方法

我们人工审查了2003年至2014年退伍军人健康管理局医院收治的1000例诊断为菌血症患者的病历。人工审查作为参考标准。我们开发并训练了带有和不带有用于否定的正则表达式过滤器(NegEx)的NLP/ML算法,并将其与11种代码替代指标组合进行比较以识别PWID。数据按70%用于训练和30%用于测试进行划分。我们通过对留出测试集进行自助法计算诊断指标并估计95%置信区间(CI)。最佳模型由最佳F分数确定,F分数是敏感性和阳性预测值的汇总。

结果

带有和不带有NegEx的随机森林是训练集中表现最佳的NLP/ML算法。带有NegEx的随机森林优于所有基于代码的算法。最佳NLP/ML算法的F分数为0.905(95%CI,0.786 - 0.967),最佳基于代码的算法的F分数为0.592(95%CI,0.550 - 0.632)。NLP/ML算法的敏感性为92.6%,特异性为95.4%。

结论

在电子健康记录中识别PWID方面,NLP/ML优于基于代码的编码算法。在识别PWID队列时应考虑NLP/ML模型,以改善临床决策、卫生服务研究和行政监测。

相似文献

引用本文的文献

本文引用的文献

2

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验