• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于胃病信息提取的自然语言处理及其在大规模临床研究中的应用

Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research.

作者信息

Song Gyuseon, Chung Su Jin, Seo Ji Yeon, Yang Sun Young, Jin Eun Hyo, Chung Goh Eun, Shim Sung Ryul, Sa Soonok, Hong Moongi Simon, Kim Kang Hyun, Jang Eunchan, Lee Chae Won, Bae Jung Ho, Han Hyun Wook

机构信息

Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.

Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.

出版信息

J Clin Med. 2022 May 24;11(11):2967. doi: 10.3390/jcm11112967.

DOI:10.3390/jcm11112967
PMID:35683353
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9181010/
Abstract

: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010-2019 to identify patient demographics and clinical information for 10 gastric diseases. For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases.

摘要

由于食管胃十二指肠镜检查(EGD)报告采用非结构化叙述格式,其临床信息的实用性受到限制。我们开发了一种自然语言处理(NLP)管道,可从非结构化的EGD报告中自动提取有关胃部疾病的信息,并证明了其在临床研究中的适用性。使用从单个医疗中心检索到的2000份EGD及相关病理报告开发了该NLP管道。该管道从EGD报告中提取了10种胃部疾病的临床信息,包括疾病的存在、位置和大小。通过评估敏感性、阳性预测值(PPV)、准确性和F1分数,用1000份EGD报告对其进行了验证。将该管道应用于2010年至2019年的248,966份EGD报告,以识别10种胃部疾病的患者人口统计学和临床信息。对于胃炎信息提取,我们分别实现了0.966、0.972、0.996和0.967的总体敏感性、PPV、准确性和F1分数。其他胃部疾病,如溃疡和肿瘤性疾病,分别实现了0.975、0.982、0.999和0.978的总体敏感性、PPV、准确性和F1分数。对超过10年的EGD数据研究揭示了按性别和年龄划分的胃部疾病患者的人口统计学特征。此外,该研究分别确定了胃炎和其他胃部疾病病变的范围和位置。我们证明了NLP管道从EGD报告中自动提取胃部疾病信息的可行性。采用该管道有助于大规模临床研究,以更好地了解胃部疾病。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/acfbcf26c606/jcm-11-02967-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/76c5e3358ef3/jcm-11-02967-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/7fe316a710cb/jcm-11-02967-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/99866650bbf9/jcm-11-02967-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/acfbcf26c606/jcm-11-02967-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/76c5e3358ef3/jcm-11-02967-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/7fe316a710cb/jcm-11-02967-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/99866650bbf9/jcm-11-02967-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f3d/9181010/acfbcf26c606/jcm-11-02967-g004.jpg

相似文献

1
Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research.用于胃病信息提取的自然语言处理及其在大规模临床研究中的应用
J Clin Med. 2022 May 24;11(11):2967. doi: 10.3390/jcm11112967.
2
Natural Language Processing for Assessing Quality Indicators in Free-Text Colonoscopy and Pathology Reports: Development and Usability Study.用于评估自由文本结肠镜检查和病理报告质量指标的自然语言处理:开发与可用性研究
JMIR Med Inform. 2022 Apr 15;10(4):e35257. doi: 10.2196/35257.
3
Automated Extraction of Tumor Staging and Diagnosis Information From Surgical Pathology Reports.从外科病理学报告中自动提取肿瘤分期和诊断信息。
JCO Clin Cancer Inform. 2021 Oct;5:1054-1061. doi: 10.1200/CCI.21.00065.
4
Natural language processing of radiology reports for identification of skeletal site-specific fractures.放射科报告的自然语言处理以识别骨骼部位特异性骨折。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):73. doi: 10.1186/s12911-019-0780-5.
5
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
6
Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing.从非结构化 MRI 报告中辨别肿瘤状态——现有报告中信息的完整性和自动化自然语言处理的实用性。
J Digit Imaging. 2010 Apr;23(2):119-32. doi: 10.1007/s10278-009-9215-7. Epub 2009 May 30.
7
Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类:一种基于规则的方法。
J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.
8
Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing.利用自然语言处理技术从马来亚大学医学中心的叙述性病理报告中自动生成概要报告
Diagnostics (Basel). 2022 Apr 1;12(4):879. doi: 10.3390/diagnostics12040879.
9
Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.用于从神经影像报告中识别无症状脑梗死的自然语言处理
JMIR Med Inform. 2019 Apr 21;7(2):e12109. doi: 10.2196/12109.
10
Natural Language Processing Accurately Calculates Adenoma and Sessile Serrated Polyp Detection Rates.自然语言处理准确计算腺瘤和无蒂锯齿状息肉的检出率。
Dig Dis Sci. 2018 Jul;63(7):1794-1800. doi: 10.1007/s10620-018-5078-4. Epub 2018 Apr 26.

引用本文的文献

1
Systematic benchmarking of large Language models in programmed cell death-oriented gastric cancer research: a comparative analysis of DeepSeek‑V3, DeepSeek‑R1, and Claude 3.5.程序性细胞死亡导向的胃癌研究中大型语言模型的系统基准测试:DeepSeek-V3、DeepSeek-R1和Claude 3.5的比较分析
Discov Oncol. 2025 Jul 1;16(1):1227. doi: 10.1007/s12672-025-02911-7.
2
Emerging applications of NLP and large language models in gastroenterology and hepatology: a systematic review.自然语言处理和大语言模型在胃肠病学和肝病学中的新兴应用:一项系统综述
Front Med (Lausanne). 2025 Jan 22;11:1512824. doi: 10.3389/fmed.2024.1512824. eCollection 2024.
3

本文引用的文献

1
Natural Language Processing for Assessing Quality Indicators in Free-Text Colonoscopy and Pathology Reports: Development and Usability Study.用于评估自由文本结肠镜检查和病理报告质量指标的自然语言处理:开发与可用性研究
JMIR Med Inform. 2022 Apr 15;10(4):e35257. doi: 10.2196/35257.
2
Med7: A transferable clinical natural language processing model for electronic health records.Med7:一种可转移的电子健康记录临床自然语言处理模型。
Artif Intell Med. 2021 Aug;118:102086. doi: 10.1016/j.artmed.2021.102086. Epub 2021 May 18.
3
Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period.
A foundation systematic review of natural language processing applied to gastroenterology & hepatology.
一项关于应用于胃肠病学和肝病学的自然语言处理的基础系统评价。
BMC Gastroenterol. 2025 Feb 6;25(1):58. doi: 10.1186/s12876-025-03608-5.
4
Advanced CNN models in gastric cancer diagnosis: enhancing endoscopic image analysis with deep transfer learning.用于胃癌诊断的先进卷积神经网络模型:通过深度迁移学习增强内镜图像分析
Front Oncol. 2024 Sep 16;14:1431912. doi: 10.3389/fonc.2024.1431912. eCollection 2024.
基于自然语言处理技术的结构化 CT 放射学报告分析 10 年间癌症患者转移病灶的模式。
Radiology. 2021 Oct;301(1):115-122. doi: 10.1148/radiol.2021210043. Epub 2021 Aug 3.
4
Cancer Statistics in Korea: Incidence, Mortality, Survival, and Prevalence in 2018.《韩国癌症统计数据:2018 年发病率、死亡率、生存率和流行率》
Cancer Res Treat. 2021 Apr;53(2):301-315. doi: 10.4143/crt.2021.291. Epub 2021 Mar 17.
5
A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing.一种使用自然语言处理提取结肠镜检查和病理学数据的透明且可适应的方法。
J Med Syst. 2020 Jul 31;44(9):151. doi: 10.1007/s10916-020-01604-8.
6
Evolving Role and Future Directions of Natural Language Processing in Gastroenterology.自然语言处理在胃肠病学中的作用演变及未来方向。
Dig Dis Sci. 2021 Jan;66(1):29-40. doi: 10.1007/s10620-020-06156-y. Epub 2020 Feb 27.
7
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
8
Use of Natural Language Processing Algorithms to Identify Common Data Elements in Operative Notes for Total Hip Arthroplasty.使用自然语言处理算法识别全髋关节置换术手术记录中的常见数据元素。
J Bone Joint Surg Am. 2019 Nov 6;101(21):1931-1938. doi: 10.2106/JBJS.19.00071.
9
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
10
Epidemiology of gastric cancer: global trends, risk factors and prevention.胃癌流行病学:全球趋势、风险因素与预防
Prz Gastroenterol. 2019;14(1):26-38. doi: 10.5114/pg.2018.80001. Epub 2018 Nov 28.