在学习型健康系统中基于机器学习的猴痘监测模型的开发。

Development of machine learning-based mpox surveillance models in a learning health system.

作者信息

Reyes Nieva Harry, Zucker Jason, Tucker Emma, McLean Jacob, DeLaurentis Clare, Gunaratne Shauna, Elhadad Noémie

机构信息

Department of Biomedical Informatics, Columbia University, New York, New York, USA

Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA.

出版信息

Sex Transm Infect. 2025 May 2. doi: 10.1136/sextrans-2024-056382.

DOI:10.1136/sextrans-2024-056382

PMID:40318862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12353557/

Abstract

OBJECTIVES

This study aimed to develop robust machine learning (ML)-based and deep learning (DL)-based models capable of detecting mpox cases for surveillance efforts using clinical notes.

METHODS

As part of a learning health system initiative, we conducted a retrospective study of clinical encounters at the Columbia University Irving Medical Center in New York City. We included patients with mpox diagnoses confirmed by PCR testing between 15 May 2022 and 15 October 2022 and three matched controls for each case based on patient age, sex, race, ethnicity and visit month. We trained three mpox surveillance models using: (1) logistic regression with L1 regularisation (least absolute shrinkage and selection operator (LASSO)), (2) ClinicalBERT and (3) ClinicalLongformer. We evaluated model performance using precision, recall, F1 score, area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) and recall at 80% precision (RP80).

RESULTS

The study included 228 PCR-confirmed mpox cases and 698 controls. LASSO regression outperformed the DL models with a precision, recall and F1 score of 0.93, AUROC of 0.97, AUPRC of 0.93 and RP80 of 0.89. ClinicalBERT achieved a precision of 0.88, recall of 0.89, F1 score of 0.88 and AUROC of 0.93. ClinicalLongformer achieved a precision of 0.87, recall of 0.88, F1 score of 0.87 and AUROC of 0.92. Phrases related to symptoms (eg, lesions and pain) were among the most predictive features in LASSO regression.

CONCLUSIONS

ML and DL models based on clinical notes show promise for identifying mpox cases. In this study, LASSO regression outperformed DL models and excelled in minimising false positives. These findings highlight the potential for ML and DL methods to support case surveillance for mpox and other infectious diseases. These methods may also prove helpful for flagging missed or delayed diagnoses as part of continuous quality improvement.

摘要

目的

本研究旨在开发基于强大的机器学习（ML）和深度学习（DL）的模型，能够利用临床记录检测猴痘病例，以用于监测工作。

方法

作为学习健康系统倡议的一部分，我们对纽约市哥伦比亚大学欧文医学中心的临床诊疗进行了一项回顾性研究。我们纳入了2022年5月15日至2022年10月15日期间经PCR检测确诊为猴痘的患者，并为每个病例根据患者年龄、性别、种族、族裔和就诊月份匹配了三个对照。我们使用以下方法训练了三种猴痘监测模型：（1）带L1正则化的逻辑回归（最小绝对收缩和选择算子（LASSO）），（2）ClinicalBERT，以及（3）ClinicalLongformer。我们使用精确率、召回率、F1分数、受试者工作特征曲线下面积（AUROC）、精确率-召回率曲线下面积（AUPRC）和80%精确率下的召回率（RP80）来评估模型性能。

结果

该研究纳入了228例经PCR确诊的猴痘病例和698例对照。LASSO回归的表现优于深度学习模型，其精确率、召回率和F1分数分别为0.93，AUROC为0.97，AUPRC为0.93，RP80为0.89。ClinicalBERT的精确率为0.88，召回率为0.89，F为0.88，AUROC为0.93。ClinicalLongformer的精确率为0.87，召回率为0.88，F1分数为0.87，AUROC为0.92。与症状相关的短语（如病变和疼痛）是LASSO回归中最具预测性的特征之一。

结论

基于临床记录的机器学习和深度学习模型在识别猴痘病例方面显示出前景。在本研究中，LASSO回归优于深度学习模型，在最小化假阳性方面表现出色。这些发现凸显了机器学习和深度学习方法在支持猴痘及其他传染病病例监测方面的潜力。这些方法也可能有助于作为持续质量改进的一部分，标记漏诊或延迟诊断的情况。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在学习型健康系统中基于机器学习的猴痘监测模型的开发。

Development of machine learning-based mpox surveillance models in a learning health system.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献

相似文献

本文引用的文献

在学习型健康系统中基于机器学习的猴痘监测模型的开发。

Development of machine learning-based mpox surveillance models in a learning health system.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论