• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建疾病数据库并使用自然语言处理技术来捕获和规范自由文本临床信息。

Constructing a disease database and using natural language processing to capture and standardize free text clinical information.

机构信息

Public Health Ontario (PHO), Toronto, ON, Canada.

Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

出版信息

Sci Rep. 2023 May 26;13(1):8591. doi: 10.1038/s41598-023-35482-0.

DOI:10.1038/s41598-023-35482-0
PMID:37237101
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10215040/
Abstract

The ability to extract critical information about an infectious disease in a timely manner is critical for population health research. The lack of procedures for mining large amounts of health data is a major impediment. The goal of this research is to use natural language processing (NLP) to extract key information (clinical factors, social determinants of health) from free text. The proposed framework describes database construction, NLP modules for locating clinical and non-clinical (social determinants) information, and a detailed evaluation protocol for evaluating results and demonstrating the effectiveness of the proposed framework. The use of COVID-19 case reports is demonstrated for data construction and pandemic surveillance. The proposed approach outperforms benchmark methods in F1-score by about 1-3%. A thorough examination reveals the disease's presence as well as the frequency of symptoms in patients. The findings suggest that prior knowledge gained through transfer learning can be useful when researching infectious diseases with similar presentations in order to accurately predict patient outcomes.

摘要

及时提取传染病关键信息对于人口健康研究至关重要。缺乏挖掘大量健康数据的程序是主要障碍。本研究旨在使用自然语言处理(NLP)从自由文本中提取关键信息(临床因素、健康的社会决定因素)。所提出的框架描述了数据库构建、用于定位临床和非临床(社会决定因素)信息的 NLP 模块,以及用于评估结果和展示所提出框架有效性的详细评估协议。使用 COVID-19 病例报告进行了数据构建和大流行监测。所提出的方法在 F1 分数上比基准方法高出约 1-3%。深入检查揭示了疾病的存在以及患者症状的频率。研究结果表明,在研究具有相似表现的传染病时,通过迁移学习获得的先验知识对于准确预测患者结局可能是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/283e5ab557be/41598_2023_35482_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/1faa0e09d966/41598_2023_35482_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/2972e32b4b9a/41598_2023_35482_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/6c24dfaac1d5/41598_2023_35482_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/e00463d43721/41598_2023_35482_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/283e5ab557be/41598_2023_35482_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/1faa0e09d966/41598_2023_35482_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/2972e32b4b9a/41598_2023_35482_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/6c24dfaac1d5/41598_2023_35482_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/e00463d43721/41598_2023_35482_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7367/10219971/283e5ab557be/41598_2023_35482_Fig5_HTML.jpg

相似文献

1
Constructing a disease database and using natural language processing to capture and standardize free text clinical information.构建疾病数据库并使用自然语言处理技术来捕获和规范自由文本临床信息。
Sci Rep. 2023 May 26;13(1):8591. doi: 10.1038/s41598-023-35482-0.
2
Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach.从 COVID-19 临床病例报告中提取实体和关系:一种自然语言处理方法。
BMC Med Inform Decis Mak. 2023 Jan 26;23(1):20. doi: 10.1186/s12911-023-02117-3.
3
Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.电子健康记录中自由文本叙述的症状的自然语言处理:系统评价。
J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173.
4
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
5
An NLP tool for data extraction from electronic health records: COVID-19 mortalities and comorbidities.一种从电子健康记录中提取数据的自然语言处理工具:COVID-19 死亡率和合并症。
Front Public Health. 2022 Dec 1;10:1070870. doi: 10.3389/fpubh.2022.1070870. eCollection 2022.
6
A Natural Language Processing Tool Offering Data Extraction for COVID-19 Related Information (DECOVRI).一种用于提取 COVID-19 相关信息的自然语言处理工具(DECOVRI)。
Stud Health Technol Inform. 2022 Jun 6;290:1062-1063. doi: 10.3233/SHTI220268.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Natural language processing for structuring clinical text data on depression using UK-CRIS.利用 UK-CRIS 对抑郁临床文本数据进行自然语言处理。
Evid Based Ment Health. 2020 Feb;23(1):21-26. doi: 10.1136/ebmental-2019-300134.
9
Monitoring COVID-19 on Social Media: Development of an End-to-End Natural Language Processing Pipeline Using a Novel Triage and Diagnosis Approach.社交媒体上的 COVID-19 监测:使用新型分诊和诊断方法开发端到端自然语言处理管道。
J Med Internet Res. 2022 Feb 28;24(2):e30397. doi: 10.2196/30397.
10
A Study of Social and Behavioral Determinants of Health in Lung Cancer Patients Using Transformers-based Natural Language Processing Models.基于变压器的自然语言处理模型研究肺癌患者健康的社会和行为决定因素。
AMIA Annu Symp Proc. 2022 Feb 21;2021:1225-1233. eCollection 2021.

引用本文的文献

1
Performance of Natural Language Processing versus International Classification of Diseases Codes in Building Registries for Patients With Fall Injury: Retrospective Analysis.自然语言处理与国际疾病分类编码在构建跌倒损伤患者登记册中的性能:回顾性分析
JMIR Med Inform. 2025 Jul 14;13:e66973. doi: 10.2196/66973.
2
Knowledge discovery of diseases symptoms and rehabilitation measures in Q&A communities.问答社区中疾病症状与康复措施的知识发现
Sci Rep. 2025 Apr 19;15(1):13593. doi: 10.1038/s41598-025-98300-9.
3
Reliability Analysis of Psychological Concept Extraction and Classification in User-penned Text.

本文引用的文献

1
Pemphigoid Nodularis Induced by Long-Term Use of Dipeptidyl Peptidase-4 Inhibitors.长期使用二肽基肽酶-4抑制剂诱发的结节性类天疱疮
Indian J Dermatol. 2023 Jan-Feb;68(1):104-105. doi: 10.4103/ijd.ijd_632_22.
2
Large-scale application of named entity recognition to biomedicine and epidemiology.命名实体识别在生物医学与流行病学中的大规模应用。
PLOS Digit Health. 2022 Dec 7;1(12):e0000152. doi: 10.1371/journal.pdig.0000152. eCollection 2022 Dec.
3
BioGPT: generative pre-trained transformer for biomedical text generation and mining.
用户撰写文本中心理概念提取与分类的可靠性分析
Proc Int AAAI Conf Weblogs Soc Media. 2024 May 31;18:422-434. doi: 10.1609/icwsm.v18i1.31324. Epub 2024 May 28.
4
Natural Language Processing and Social Determinants of Health in Mental Health Research: AI-Assisted Scoping Review.心理健康研究中的自然语言处理与健康的社会决定因素:人工智能辅助的范围综述
JMIR Ment Health. 2025 Jan 16;12:e67192. doi: 10.2196/67192.
5
Natural language processing-based analysis of the level of adoption by expert radiologists of the ASSR, ASNR and NASS version 2.0 of lumbar disc nomenclature: an eight-year survey.基于自然语言处理的专家放射科医生对ASSR、ASNR和腰椎间盘命名法第2.0版NASS的采用水平分析:一项为期八年的调查。
Quant Imaging Med Surg. 2024 Nov 1;14(11):7780-7790. doi: 10.21037/qims-23-1294. Epub 2024 Feb 23.
6
From COVID-19 to monkeypox: a novel predictive model for emerging infectious diseases.从新冠疫情到猴痘:一种针对新发传染病的新型预测模型。
BioData Min. 2024 Oct 22;17(1):42. doi: 10.1186/s13040-024-00396-8.
7
Updated Surveillance Metrics and History of the COVID-19 Pandemic (2020-2023) in the Middle East and North Africa: Longitudinal Trend Analysis.中东和北非地区 2020-2023 年 COVID-19 大流行的更新监测指标和历史:纵向趋势分析。
JMIR Public Health Surveill. 2024 Jun 12;10:e53219. doi: 10.2196/53219.
BioGPT:用于生物医学文本生成和挖掘的生成式预训练转换器。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac409.
4
CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice.CoQUAD:一个 COVID-19 问答数据集系统,促进研究、基准测试和实践。
BMC Bioinformatics. 2022 Jun 2;23(1):210. doi: 10.1186/s12859-022-04751-6.
5
Quantifying the effects of the COVID-19 pandemic on gender equality on health, social, and economic indicators: a comprehensive review of data from March, 2020, to September, 2021.量化 COVID-19 大流行对健康、社会和经济指标性别平等的影响:对 2020 年 3 月至 2021 年 9 月数据的综合审查。
Lancet. 2022 Jun 25;399(10344):2381-2397. doi: 10.1016/S0140-6736(22)00008-3. Epub 2022 Mar 2.
6
A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media.一种从临床文本中提取症状的深度语言模型及其在从社交媒体中提取 COVID-19 症状的应用。
IEEE J Biomed Health Inform. 2022 Apr;26(4):1737-1748. doi: 10.1109/JBHI.2021.3123192. Epub 2022 Apr 14.
7
Economic impact of COVID-19 pandemic on healthcare facilities and systems: International perspectives.COVID-19 大流行对医疗保健设施和系统的经济影响:国际视角。
Best Pract Res Clin Anaesthesiol. 2021 Oct;35(3):293-306. doi: 10.1016/j.bpa.2020.11.009. Epub 2020 Nov 17.
8
A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.一个用统一医学语言系统(UMLS)实体注释的临床试验语料库,以加强对循证医学的获取。
BMC Med Inform Decis Mak. 2021 Feb 22;21(1):69. doi: 10.1186/s12911-021-01395-z.
9
Named Entity Recognition and Relation Detection for Biomedical Information Extraction.用于生物医学信息提取的命名实体识别与关系检测
Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.
10
Impact of COVID-19 outbreak by income: hitting hardest the most deprived.疫情对收入的影响:最贫困人群受冲击最大。
J Public Health (Oxf). 2020 Nov 23;42(4):698-703. doi: 10.1093/pubmed/fdaa136.