针对在医院外进行严重急性呼吸综合征冠状病毒2（SARS-CoV-2）检测的患者的可计算表型。

A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital.

作者信息

Wang Lijing, Zipursky Amy, Geva Alon, McMurry Andrew J, Mandl Kenneth D, Miller Timothy A

出版信息

medRxiv. 2023 Jan 19:2023.01.19.23284738. doi: 10.1101/2023.01.19.23284738.

DOI:10.1101/2023.01.19.23284738

PMID:36711461

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9882620/

Abstract

OBJECTIVE

To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

MATERIALS AND METHODS

Statistical classifiers were trained on feature representations derived from unstructured text in patient electronic health records (EHRs). We used a proxy dataset of patients COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.

RESULTS

On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 90.8% (79/87) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier identified an additional 960 positive cases that did not have SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.

DISCUSSION

Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.

CONCLUSION

COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor intensive labeling efforts.

摘要

目的

确定一组新冠肺炎病例，包括那些病毒阳性证据仅在临床文本中提及，而不在电子健康记录（EHR）的结构化实验室数据中的病例。

材料与方法

统计分类器基于患者电子健康记录（EHR）中非结构化文本的特征表示进行训练。我们使用患者新冠肺炎聚合酶链反应（PCR）检测的代理数据集进行训练。我们根据代理数据集上的性能选择了一个模型，并将其应用于没有新冠肺炎PCR检测的实例。一名医生对这些实例的样本进行了审查，以验证分类器。

结果

在代理数据集的测试分割中，我们最好的分类器对严重急性呼吸综合征冠状病毒2（SARS-CoV2）阳性病例的F1得分为0.56，精确率为0.6，召回率为0.52。在专家验证中，分类器正确地将90.8%（79/87）识别为新冠肺炎阳性，将97.8%（91/93）识别为非SARS-CoV2阳性。该分类器识别出另外960例在医院没有进行SARS-CoV2实验室检测的阳性病例，其中只有177例具有新冠肺炎的国际疾病分类第十版（ICD-10）编码。

讨论

代理数据集的性能可能较差，因为这些实例有时包括对待处理实验室检测的讨论。最具预测性的特征是有意义且可解释的。很少提及所进行的外部检测类型。

结论

可以从EHR文本中可靠地检测出在医院外进行检测的新冠肺炎病例。在代理数据集上进行训练是一种合适的方法，无需大量人工标注工作就能开发出高性能的分类器。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

针对在医院外进行严重急性呼吸综合征冠状病毒2（SARS-CoV-2）检测的患者的可计算表型。

A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital.

作者信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

相似文献

针对在医院外进行严重急性呼吸综合征冠状病毒2（SARS-CoV-2）检测的患者的可计算表型。

A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital.

作者信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论