• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过混合自然语言处理从临床记录中提取新冠病毒感染的急性后遗症症状

Extracting post-acute sequelae of SARS-CoV-2 infection symptoms from clinical notes via hybrid natural language processing.

作者信息

Bai Zilong, Xu Zihan, Sun Cong, Zang Chengxi, Bunnell H Timothy, Sinfield Catherine, Rutter Jacqueline, Martinez Aaron Thomas, Bailey L Charles, Weiner Mark, Campion Thomas R, Carton Thomas W, Forrest Christopher B, Kaushal Rainu, Wang Fei, Peng Yifan

机构信息

Population Health Sciences, Weill Cornell Medicine, New York, USA.

Nemours Children's Health, Wilmington, USA.

出版信息

Npj Health Syst. 2025 Aug 21;2. doi: 10.1038/s44401-025-00033-4.

DOI:10.1038/s44401-025-00033-4
PMID:40958972
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12435580/
Abstract

Accurately and efficiently diagnosing Post-Acute Sequelae of COVID-19 (PASC) remains challenging due to its myriad symptoms that evolve over long- and variable-time intervals. To address this issue, we developed a hybrid natural language processing pipeline that integrates rule-based named entity recognition with BERT-based assertion detection modules for PASC-symptom extraction and assertion detection from clinical notes. We developed a comprehensive PASC lexicon with clinical specialists. From 11 health systems of the RECOVER initiative network across the U.S., we curated 160 intake progress notes for model development and evaluation, and collected 47,654 progress notes for a population-level prevalence study. We achieved an average F1 score of 0.82 in one-site internal validation and 0.76 in 10-site external validation for assertion detection. Our pipeline processed each note at 2.448 ± 0.812 seconds on average. Spearman correlation tests showed ρ > 0.83 for positive mentions and ρ > 0.72 for negative ones, both with < 0.0001. These demonstrate the effectiveness and efficiency of our models and its potential for improving PASC diagnosis.

摘要

由于新冠后急性后遗症(PASC)症状繁多且会在较长且可变的时间间隔内演变,准确有效地诊断PASC仍然具有挑战性。为解决这一问题,我们开发了一种混合自然语言处理流程,该流程将基于规则的命名实体识别与基于BERT的断言检测模块相结合,用于从临床记录中提取PASC症状并进行断言检测。我们与临床专家共同开发了一个全面的PASC词汇表。从美国RECOVER倡议网络的11个卫生系统中,我们挑选了160份入院进展记录用于模型开发和评估,并收集了47654份进展记录用于人群水平的患病率研究。在单站点内部验证中,我们的断言检测平均F1分数为0.82,在10站点外部验证中为0.76。我们的流程平均每处理一份记录需要2.448±0.812秒。斯皮尔曼相关性检验显示,阳性提及的ρ>0.83,阴性提及的ρ>0.72,两者的p值均<0.0001。这些结果证明了我们模型的有效性和效率及其在改善PASC诊断方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/d42e1be7270f/nihms-2106737-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/9960438fbbb6/nihms-2106737-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/05b4be73fe5b/nihms-2106737-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/e983d21ca317/nihms-2106737-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/2c26d6839a32/nihms-2106737-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/6bfa412e0576/nihms-2106737-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/5b692f2305c6/nihms-2106737-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/dc6eab1a0ddf/nihms-2106737-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/9d8a099ad0cb/nihms-2106737-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/d42e1be7270f/nihms-2106737-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/9960438fbbb6/nihms-2106737-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/05b4be73fe5b/nihms-2106737-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/e983d21ca317/nihms-2106737-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/2c26d6839a32/nihms-2106737-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/6bfa412e0576/nihms-2106737-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/5b692f2305c6/nihms-2106737-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/dc6eab1a0ddf/nihms-2106737-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/9d8a099ad0cb/nihms-2106737-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/058f/12435580/d42e1be7270f/nihms-2106737-f0009.jpg

相似文献

1
Extracting post-acute sequelae of SARS-CoV-2 infection symptoms from clinical notes via hybrid natural language processing.通过混合自然语言处理从临床记录中提取新冠病毒感染的急性后遗症症状
Npj Health Syst. 2025 Aug 21;2. doi: 10.1038/s44401-025-00033-4.
2
Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study.使用微调临床语言模型识别临床文本中的药物不良事件:机器学习研究
JMIR Form Res. 2025 Sep 11;9:e71949. doi: 10.2196/71949.
3
An Extraction Tool for Venous Thromboembolism Symptom Identification in Primary Care Notes to Facilitate Electronic Clinical Quality Measure Reporting: Algorithm Development and Validation Study.一种用于在初级保健记录中识别静脉血栓栓塞症状以促进电子临床质量指标报告的提取工具:算法开发与验证研究
JMIR Med Inform. 2025 Aug 26;13:e63720. doi: 10.2196/63720.
4
Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study.利用人工智能从医生笔记中检测症状,推动生物监测超越编码数据:回顾性队列研究。
J Med Internet Res. 2024 Apr 4;26:e53367. doi: 10.2196/53367.
5
Antibody tests for identification of current and past infection with SARS-CoV-2.抗体检测用于鉴定 SARS-CoV-2 的现症感染和既往感染。
Cochrane Database Syst Rev. 2022 Nov 17;11(11):CD013652. doi: 10.1002/14651858.CD013652.pub2.
6
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
7
Exploring Named Entity Recognition Potential and the Value of Tailored Natural Language Processing Pipelines for Radiology, Pathology, and Progress Notes in Clinical Decision Support: Quantitative Study.探索命名实体识别潜力以及定制自然语言处理管道在放射学、病理学和临床决策支持中的病程记录方面的价值:定量研究
JMIR AI. 2025 Sep 5;4:e59251. doi: 10.2196/59251.
8
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染鉴定的影响。
Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.
9
Ethnic and racial differences in children and young people with respiratory and neurological post-acute sequelae of SARS-CoV-2: an electronic health record-based cohort study from the RECOVER Initiative.感染SARS-CoV-2后出现呼吸和神经后遗症的儿童及青少年的种族差异:一项基于电子健康记录的RECOVER计划队列研究
EClinicalMedicine. 2025 Jan 2;80:103042. doi: 10.1016/j.eclinm.2024.103042. eCollection 2025 Feb.
10
Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.针对四个地区的地方、区域和国家孕产妇保健系统的疫情后规划:一项混合方法研究。
Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.

本文引用的文献

1
The persistence of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) after SARS-CoV-2 infection: A systematic review and meta-analysis.严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染后肌痛性脑脊髓炎/慢性疲劳综合征(ME/CFS)的持续性:一项系统评价和荟萃分析。
J Infect. 2024 Dec;89(6):106297. doi: 10.1016/j.jinf.2024.106297. Epub 2024 Sep 29.
2
A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation.来自国家新冠病毒队列协作组的开放健康自然语言处理工具包案例演示以及为新冠病毒感染或新冠后综合征增强恢复计划而开展的新冠病毒自然语言处理系统研究:算法开发与验证
JMIR Med Inform. 2024 Sep 9;12:e49997. doi: 10.2196/49997.
3
Myalgic Encephalomyelitis/Chronic Fatigue Syndrome After SARS-CoV-2 Infection.新型冠状病毒感染后出现的肌痛性脑脊髓炎/慢性疲劳综合征。
JAMA Netw Open. 2024 Jul 1;7(7):e2423555. doi: 10.1001/jamanetworkopen.2024.23555.
4
Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study.利用人工智能从医生笔记中检测症状,推动生物监测超越编码数据:回顾性队列研究。
J Med Internet Res. 2024 Apr 4;26:e53367. doi: 10.2196/53367.
5
A medical multimodal large language model for future pandemics.用于应对未来大流行的医学多模态大语言模型。
NPJ Digit Med. 2023 Dec 2;6(1):226. doi: 10.1038/s41746-023-00952-2.
6
The Development and Implementation of A Data Repository for Swallow Studies.吞咽研究数据存储库的开发与实施。
Dysphagia. 2024 Jun;39(3):476-483. doi: 10.1007/s00455-023-10632-8. Epub 2023 Nov 6.
7
Long COVID and Significant Activity Limitation Among Adults, by Age - United States, June 1-13, 2022, to June 7-19, 2023.长新冠和成年人的显著活动受限,按年龄分组-美国,2022 年 6 月 1 日至 13 日,至 2023 年 6 月 7 日至 19 日。
MMWR Morb Mortal Wkly Rep. 2023 Aug 11;72(32):866-870. doi: 10.15585/mmwr.mm7232a3.
8
Editorial: Post-Acute Sequelae of SARS-CoV-2 Infection (PASC). Updated Terminology for the Long-Term Effects of COVID-19.社论:严重急性呼吸综合征冠状病毒 2 感染的后遗症 (PASC)。COVID-19 长期影响的最新术语。
Med Sci Monit. 2023 Jul 1;29:e941595. doi: 10.12659/MSM.941595.
9
Development of a Definition of Postacute Sequelae of SARS-CoV-2 Infection.开发 SARS-CoV-2 感染后后遗症的定义。
JAMA. 2023 Jun 13;329(22):1934-1946. doi: 10.1001/jama.2023.8823.
10
Potential pitfalls in the use of real-world data for studying long COVID.使用真实世界数据研究长期新冠的潜在陷阱。
Nat Med. 2023 May;29(5):1040-1043. doi: 10.1038/s41591-023-02274-y.