Suppr超能文献

使用交互式机器学习方法在电子病历中对有注射吸毒史的人进行表型分析。

Phenotyping people with a history of injecting drug use within electronic medical records using an interactive machine learning approach.

作者信息

El-Hayek Carol, Nguyen Thi, Hellard Margaret E, Curtis Michael, Sacks-Davis Rachel, Aung Htein Linn, Asselin Jason, Boyle Douglas I R, Wilkinson Anna, Polkinghorne Victoria, Hocking Jane S, Dunn Adam G

机构信息

Public Health, Burnet Institute, Melbourne, Australia.

Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia.

出版信息

NPJ Digit Med. 2024 Nov 30;7(1):346. doi: 10.1038/s41746-024-01318-y.

Abstract

People with a history of injecting drug use are a priority for eliminating blood-borne viruses and sexually transmissible infections. Identifying them for disease surveillance in electronic medical records (EMRs) is challenged by sparsity of predictors. This study introduced a novel approach to phenotype people who have injected drugs using structured EMR data and interactive human-in-the-loop methods. We iteratively trained random forest classifiers removing important features and adding new positive labels each time. The initial model achieved 92.7% precision and 93.5% recall. Models maintained >90% precision and recall after nine iterations, revealing combinations of less obvious features influencing predictions. Applied to approximately 1.7 million patients, the final model identified 128,704 (7.7%) patients as potentially having injected drugs, beyond the 50,510 (2.9%) with known indicators of injecting drug use. This process produced explainable models that revealed otherwise hidden combinations of predictors, offering an adaptive approach to addressing the inherent challenge of inconsistently missing data in EMRs.

摘要

有注射吸毒史的人群是消除血源性病原体和性传播感染的重点对象。在电子病历(EMR)中识别他们以进行疾病监测面临着预测指标稀缺的挑战。本研究引入了一种新颖的方法,利用结构化EMR数据和交互式人工参与方法对曾注射过毒品的人群进行表型分析。我们迭代训练随机森林分类器,每次去除重要特征并添加新的阳性标签。初始模型的精确率达到92.7%,召回率达到93.5%。经过九次迭代后,模型的精确率和召回率均保持在90%以上,揭示了影响预测的不太明显的特征组合。将最终模型应用于约170万患者,识别出128,704名(7.7%)患者可能曾注射过毒品,这超出了已知有注射吸毒指标的50,510名(2.9%)患者。这一过程产生了可解释的模型,揭示了原本隐藏的预测指标组合,为应对EMR中数据缺失不一致这一固有挑战提供了一种自适应方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf9/11608217/c089a7cda40a/41746_2024_1318_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验