• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自由文本临床报告的间皮瘤患者辅助癌症病史自动分类

Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports.

作者信息

Wilson Richard A, Chapman Wendy W, Defries Shawn J, Becich Michael J, Chapman Brian E

机构信息

Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue, Pittsburgh, PA; USA.

出版信息

J Pathol Inform. 2010 Oct 11;1:24. doi: 10.4103/2153-3539.71065.

DOI:10.4103/2153-3539.71065
PMID:21031012
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2956176/
Abstract

BACKGROUND

Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents.

METHODS

Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient's personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx's classification responses using each of the two methods were measured against the reference standard. The average Cohen's weighted kappa served as the human benchmark in evaluating the system.

RESULTS

Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively.

CONCLUSION

We evaluated an automated application's performance in classifying a mesothelioma patient's personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations.

摘要

背景

临床记录通常是无结构的自由文本文件,这给信息提取带来了挑战并增加了成本。医疗保健服务和研究机构,如国家间皮瘤虚拟数据库,需要整合结构化和非结构化数据类型。自然语言处理提供了从无结构的自由文本文件中自动提取信息的技术。

方法

将508份间皮瘤患者的病史和体格检查报告分为开发集(208份)和测试集(300份)。制定了参考标准,专家对每份报告就患者的辅助癌症个人史和任何癌症的家族史进行注释。开发了Hx应用程序来处理报告、提取相关特征、进行指代消解并对癌症病史进行分类。评估了两种信息提取方法,即动态窗口法和上下文法。将使用这两种方法的Hx分类响应与参考标准进行比较。平均科恩加权kappa值作为评估该系统的人工基准。

结果

Hx的总体准确率很高,每种方法的得分均为96.2%。使用动态窗口法和上下文法的F值分别为91.8%和91.6%,与人工基准的92.8%相当。在个人史分类方面,动态窗口法得分最高,为89.2%;在家族史分类方面,上下文法得分最高为97.6%,两种方法分别与人工基准的88.3%和97.2%相当。

结论

我们评估了一个自动化应用程序从临床报告中对间皮瘤患者的个人和家族癌症史进行分类的性能。为此,Hx应用程序必须处理报告、识别癌症概念、将已知的间皮瘤与辅助癌症区分开来、识别否定词、进行指代消解并确定经历者。结果表明,测试的两种信息提取方法都依赖于特定领域的词汇表和否定词提取。我们表明,更通用的方法上下文法与我们的特定任务方法表现相当。虽然动态窗口法可以修改以检索其他概念,但上下文法更稳健,在不确定的概念上表现更好。Hx可以极大地改进和加快从自由文本临床记录中提取数据的过程,适用于各种研究或医疗保健服务组织。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/e5287696c1c6/JPI-1-24-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/7e0c6f4ca566/JPI-1-24-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/308719a4b83e/JPI-1-24-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/6b4f58e8d6cc/JPI-1-24-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/cd383aa6bef8/JPI-1-24-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/e5287696c1c6/JPI-1-24-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/7e0c6f4ca566/JPI-1-24-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/308719a4b83e/JPI-1-24-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/6b4f58e8d6cc/JPI-1-24-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/cd383aa6bef8/JPI-1-24-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/2956176/e5287696c1c6/JPI-1-24-g005.jpg

相似文献

1
Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports.基于自由文本临床报告的间皮瘤患者辅助癌症病史自动分类
J Pathol Inform. 2010 Oct 11;1:24. doi: 10.4103/2153-3539.71065.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
[A customized method for information extraction from unstructured text data in the electronic medical records].[一种从电子病历非结构化文本数据中提取信息的定制方法]
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.
4
Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation.从临床记录中自动提取旅行史以用于传染病事件的检测:算法的开发和验证。
JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.
5
Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study.用于宫颈癌和肛门癌及癌前病变监测的自然语言处理:算法开发与分割验证研究
JMIR Med Inform. 2020 Nov 3;8(11):e20826. doi: 10.2196/20826.
6
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
7
Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury.小儿创伤性脑损伤计算机断层扫描成像报告的自动结果分类
Acad Emerg Med. 2016 Feb;23(2):171-8. doi: 10.1111/acem.12859. Epub 2016 Jan 14.
8
Text mining brain imaging reports.文本挖掘脑成像报告。
J Biomed Semantics. 2019 Nov 12;10(Suppl 1):23. doi: 10.1186/s13326-019-0211-7.
9
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.使用自然语言处理方法从自由文本和非结构化患者生成的健康数据中提取医学信息:基于真实世界数据的可行性研究
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.
10
Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类:一种基于规则的方法。
J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.

引用本文的文献

1
Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.通过癌症自然语言处理的范围综述评估癌症研究和患者护理的电子健康记录。
JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006.
2
Comparison of Machine-Learning Algorithms for the Prediction of Current Procedural Terminology (CPT) Codes from Pathology Reports.用于从病理报告预测当前操作术语(CPT)代码的机器学习算法比较
J Pathol Inform. 2022 Jan 5;13:3. doi: 10.4103/jpi.jpi_52_21. eCollection 2022.
3
Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record.

本文引用的文献

1
Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm.基于 ConText 算法扩展的 CT 肺动脉造影报告的文档级分类。
J Biomed Inform. 2011 Oct;44(5):728-37. doi: 10.1016/j.jbi.2011.03.011. Epub 2011 Apr 1.
2
Malignant mesothelioma.恶性间皮瘤。
Br Med Bull. 2010;93:105-23. doi: 10.1093/bmb/ldp047. Epub 2010 Jan 4.
3
ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports.语境:一种从临床报告中确定否定、体验者和时间状态的算法。
根据电子健康记录中的家族史注释确定家族性乳腺癌和结直肠癌的发病时间。
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:173-181. eCollection 2019.
4
A Frame-Based NLP System for Cancer-Related Information Extraction.一种用于癌症相关信息提取的基于框架的自然语言处理系统。
AMIA Annu Symp Proc. 2018 Dec 5;2018:1524-1533. eCollection 2018.
5
Detecting Evidence of Intra-abdominal Surgical Site Infections from Radiology Reports Using Natural Language Processing.使用自然语言处理技术从放射学报告中检测腹腔内手术部位感染的证据
AMIA Annu Symp Proc. 2018 Apr 16;2017:515-524. eCollection 2017.
6
Developing a web-based SKOS editor.开发一个基于网络的SKOS编辑器。
J Biomed Semantics. 2016 Apr 4;7:5. doi: 10.1186/s13326-015-0043-z. eCollection 2016.
7
Use of structured and unstructured data to identify contraceptive use in women veterans.利用结构化和非结构化数据识别女性退伍军人的避孕措施使用情况。
Perspect Health Inf Manag. 2013 Jul 1;10(Summer):1e. Print 2013.
8
Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases.从完整的电子健康记录中提取和整合数据以检测结直肠癌病例。
AMIA Annu Symp Proc. 2011;2011:1564-72. Epub 2011 Oct 22.
9
Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm.基于 ConText 算法扩展的 CT 肺动脉造影报告的文档级分类。
J Biomed Inform. 2011 Oct;44(5):728-37. doi: 10.1016/j.jbi.2011.03.011. Epub 2011 Apr 1.
J Biomed Inform. 2009 Oct;42(5):839-51. doi: 10.1016/j.jbi.2009.05.002. Epub 2009 May 10.
4
Identification and extraction of family history information from clinical reports.从临床报告中识别和提取家族病史信息。
AMIA Annu Symp Proc. 2008 Nov 6;2008:247-51.
5
Five-way smoking status classification using text hot-spot identification and error-correcting output codes.使用文本热点识别和纠错输出码的五分类吸烟状态分类法
J Am Med Inform Assoc. 2008 Jan-Feb;15(1):32-5. doi: 10.1197/jamia.M2434. Epub 2007 Oct 18.
6
Domain-specific language models and lexicons for tagging.用于标记的特定领域语言模型和词汇表。
J Biomed Inform. 2005 Dec;38(6):422-30. doi: 10.1016/j.jbi.2005.02.009. Epub 2005 Apr 2.
7
Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research.评估一种用于共享病理学报告和临床文档以进行研究的去识别化(De-Id)软件引擎。
Am J Clin Pathol. 2004 Feb;121(2):176-86. doi: 10.1309/E6K3-3GBP-E5C2-7FYU.
8
Mesothelioma trends in the United States: an update based on Surveillance, Epidemiology, and End Results Program data for 1973 through 2003.美国间皮瘤发病趋势:基于1973年至2003年监测、流行病学和最终结果计划数据的更新
Am J Epidemiol. 2004 Jan 15;159(2):107-12. doi: 10.1093/aje/kwh025.
9
A simple algorithm for identifying negated findings and diseases in discharge summaries.一种用于识别出院小结中否定性检查结果和疾病的简单算法。
J Biomed Inform. 2001 Oct;34(5):301-10. doi: 10.1006/jbin.2001.1029.
10
Who should abstract medical records? A study of accuracy and cost.谁应该提取病历?一项关于准确性和成本的研究。
Eval Health Prof. 1981 Mar;4(1):79-92. doi: 10.1177/016327878100400106.