• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对在医院外进行严重急性呼吸综合征冠状病毒2(SARS-CoV-2)检测的患者的可计算病例定义。

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital.

作者信息

Wang Lijing, Zipursky Amy R, Geva Alon, McMurry Andrew J, Mandl Kenneth D, Miller Timothy A

机构信息

Department of Data Science, New Jersey Institute of Technology, Newark, New Jersey, USA.

Computational Health Informatics Program and Department of Emergency Medicine, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.

出版信息

JAMIA Open. 2023 Jul 5;6(3):ooad047. doi: 10.1093/jamiaopen/ooad047. eCollection 2023 Oct.

DOI:10.1093/jamiaopen/ooad047
PMID:37425487
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10322650/
Abstract

OBJECTIVE

To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

MATERIALS AND METHODS

Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.

RESULTS

On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.

DISCUSSION

Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.

CONCLUSION

COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.

摘要

目的

确定一组新冠肺炎病例,包括病毒阳性证据仅在临床文本中提及,而不在电子健康记录(EHR)的结构化实验室数据中的病例。

材料与方法

统计分类器基于患者EHR中非结构化文本衍生的特征表示进行训练。我们使用患者新冠肺炎聚合酶链反应(PCR)检测的代理数据集进行训练。我们根据代理数据集上的性能选择了一个模型,并将其应用于没有新冠肺炎PCR检测的实例。一名医生对这些实例的一个样本进行了审查,以验证分类器。

结果

在代理数据集的测试分割中,我们最好的分类器对严重急性呼吸综合征冠状病毒2(SARS-CoV2)阳性病例的F1得分为0.56,精确率为0.6,召回率为 0.52。在专家验证中,分类器正确识别出97.6%(81/84)为新冠肺炎阳性,97.8%(91/93)为非SARS-CoV2阳性。该分类器将另外960例病例标记为在医院没有进行SARS-CoV2实验室检测,其中只有177例病例具有新冠肺炎的国际疾病分类第十版(ICD-10)编码。

讨论

代理数据集的性能可能更差,因为这些实例有时包括对待处理实验室检测的讨论。最具预测性的特征是有意义且可解释的。很少提及所进行的外部检测类型。

结论

可以从EHR文本中可靠地检测出在医院外进行检测的新冠肺炎病例。在代理数据集上进行训练是开发高性能分类器的一种合适方法,无需进行劳动密集型的标注工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/956c/10322650/2fa53422b807/ooad047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/956c/10322650/2fa53422b807/ooad047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/956c/10322650/2fa53422b807/ooad047f1.jpg

相似文献

1
A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital.针对在医院外进行严重急性呼吸综合征冠状病毒2(SARS-CoV-2)检测的患者的可计算病例定义。
JAMIA Open. 2023 Jul 5;6(3):ooad047. doi: 10.1093/jamiaopen/ooad047. eCollection 2023 Oct.
2
A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital.针对在医院外进行严重急性呼吸综合征冠状病毒2(SARS-CoV-2)检测的患者的可计算表型。
medRxiv. 2023 Jan 19:2023.01.19.23284738. doi: 10.1101/2023.01.19.23284738.
3
A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.基于荷兰全科电子健康记录的 COVID-19 检测自然语言处理模型:使用转换器的双向编码器表示进行开发和验证研究。
J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.
4
Accuracy of Computable Phenotyping Approaches for SARS-CoV-2 Infection and COVID-19 Hospitalizations from the Electronic Health Record.基于电子健康记录的新冠病毒感染和新冠住院可计算表型分析方法的准确性
medRxiv. 2021 May 13:2021.03.16.21253770. doi: 10.1101/2021.03.16.21253770.
5
Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing.COVID-19住院患者处置情况分类:使用自然语言处理技术阅读出院小结
JMIR Med Inform. 2021 Feb 10;9(2):e25457. doi: 10.2196/25457.
6
Controlled, double-blind, randomized trial to assess the efficacy and safety of hydroxychloroquine chemoprophylaxis in SARS CoV2 infection in healthcare personnel in the hospital setting: A structured summary of a study protocol for a randomised controlled trial.在医院环境中评估羟氯喹化学预防 SARS-CoV2 感染在医护人员中的疗效和安全性的对照、双盲、随机试验:一项随机对照试验研究方案的结构化总结。
Trials. 2020 Jun 3;21(1):472. doi: 10.1186/s13063-020-04400-4.
7
A multicenter evaluation of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations.一项针对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染和新冠肺炎住院病例的可计算表型分析方法的多中心评估。
NPJ Digit Med. 2022 Mar 8;5(1):27. doi: 10.1038/s41746-022-00570-4.
8
LATTE: A knowledge-based method to normalize various expressions of laboratory test results in free text of Chinese electronic health records.LATTE:一种基于知识的方法,用于规范化中文电子健康记录自由文本中实验室检查结果的各种表达方式。
J Biomed Inform. 2020 Feb;102:103372. doi: 10.1016/j.jbi.2019.103372. Epub 2019 Dec 31.
9
Automatic coronavirus disease 2019 diagnosis based on chest radiography and deep learning - Success story or dataset bias?基于胸部 X 光和深度学习的新型冠状病毒病 2019 自动诊断——成功案例还是数据集偏差?
Med Phys. 2022 Feb;49(2):978-987. doi: 10.1002/mp.15419. Epub 2022 Jan 12.
10
Analysis of Stroke Detection during the COVID-19 Pandemic Using Natural Language Processing of Radiology Reports.利用放射学报告的自然语言处理分析 COVID-19 大流行期间的中风检测。
AJNR Am J Neuroradiol. 2021 Mar;42(3):429-434. doi: 10.3174/ajnr.A6961. Epub 2020 Dec 17.

引用本文的文献

1
Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study.利用人工智能从医生笔记中检测症状,推动生物监测超越编码数据:回顾性队列研究。
J Med Internet Res. 2024 Apr 4;26:e53367. doi: 10.2196/53367.
2
The SMART Text2FHIR Pipeline.SMART 文本到 FHIR 管道。
AMIA Annu Symp Proc. 2024 Jan 11;2023:514-520. eCollection 2023.
3
The SMART Text2FHIR Pipeline.SMART Text2FHIR管道。

本文引用的文献

1
Identifying who has long COVID in the USA: a machine learning approach using N3C data.在美国识别长新冠患者:使用 N3C 数据的机器学习方法。
Lancet Digit Health. 2022 Jul;4(7):e532-e541. doi: 10.1016/S2589-7500(22)00048-6. Epub 2022 May 16.
2
Predictors of critical care, mechanical ventilation, and mortality among hospitalized patients with COVID-19 in an electronic health record database.电子健康记录数据库中 COVID-19 住院患者入住重症监护病房、机械通气和死亡的预测因素。
BMC Infect Dis. 2022 Apr 29;22(1):413. doi: 10.1186/s12879-022-07383-6.
3
Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study.
medRxiv. 2023 Mar 27:2023.03.21.23287499. doi: 10.1101/2023.03.21.23287499.
区分因 COVID-19 而住院与因 SARS-CoV-2 而偶然住院:全国回顾性电子健康记录研究。
J Med Internet Res. 2022 May 18;24(5):e37931. doi: 10.2196/37931.
4
Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data.用于预测COVID-19患者入院时预后的循环神经网络模型(CovRNN):使用电子健康记录数据进行模型开发和验证
Lancet Digit Health. 2022 Jun;4(6):e415-e425. doi: 10.1016/S2589-7500(22)00049-8. Epub 2022 Apr 21.
5
A multicenter evaluation of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations.一项针对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染和新冠肺炎住院病例的可计算表型分析方法的多中心评估。
NPJ Digit Med. 2022 Mar 8;5(1):27. doi: 10.1038/s41746-022-00570-4.
6
Utilization of an Electronic Health Record Integrated Risk Score to Predict Hospitalization Among COVID-19 Patients.利用电子健康记录综合风险评分预测 COVID-19 患者的住院情况。
J Prim Care Community Health. 2022 Jan-Dec;13:21501319211069748. doi: 10.1177/21501319211069748.
7
A Computable Phenotype for Acute Respiratory Distress Syndrome Using Natural Language Processing and Machine Learning.一种使用自然语言处理和机器学习的急性呼吸窘迫综合征可计算表型
AMIA Annu Symp Proc. 2018 Dec 5;2018:157-165. eCollection 2018.
8
A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments.一项评估可执行计算表型算法在多个机构和电子健康记录环境中可移植性的案例研究。
J Am Med Inform Assoc. 2018 Nov 1;25(11):1540-1546. doi: 10.1093/jamia/ocy101.
9
A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry.一种可计算的表型改善了儿科肺动脉高压登记处的队列确定。
J Pediatr. 2017 Sep;188:224-231.e5. doi: 10.1016/j.jpeds.2017.05.037. Epub 2017 Jun 16.
10
Rationale-Augmented Convolutional Neural Networks for Text Classification.用于文本分类的基于原理增强的卷积神经网络。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:795-804. doi: 10.18653/v1/d16-1076.