Suppr超能文献

通过采用分类、命名实体识别和关系提取启发式方法的自然语言处理途径从病理报告中获取知识。

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics.

作者信息

Oliwa Tomasz, Maron Steven B, Chase Leah M, Lomnicki Samantha, Catenacci Daniel V T, Furner Brian, Volchenboum Samuel L

机构信息

The University of Chicago, Chicago, IL.

Memorial Sloan Kettering Cancer Center, New York, NY.

出版信息

JCO Clin Cancer Inform. 2019 Aug;3:1-8. doi: 10.1200/CCI.19.00008.

Abstract

PURPOSE

Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions.

PATIENTS AND METHODS

Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step.

RESULTS

We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance.

CONCLUSION

Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

摘要

目的

强大的机构肿瘤库依赖于持续的样本管理,否则后续的活检或切除标本在初始入组后可能会被忽视。半结构化的自由文本临床病理记录阻碍了管理自动化,这使得数据提取变得复杂。我们的动机是开发一种自然语言处理方法,以动态识别定位标本以供未来使用所需的现有病理标本元素,且该方法可被其他机构重新实施。

患者与方法

使用芝加哥大学胃肠肿瘤肿瘤库中登记的食管癌患者的病理报告,来训练和验证一种基于自然语言处理的新型复合流程,该流程包括一个监督式机器学习分类步骤,用于将记录分为内部(初次审核)和外部(会诊)报告;一个命名实体识别步骤,以获取标签( accession编号)、位置、日期和子标签(块标识符);以及一个结果校对步骤。

结果

我们分析了188份病理报告,包括82份内部报告和106份外部会诊报告,并成功提取了归类为样本信息(标签、日期、位置)的命名实体。我们的方法在外部会诊记录中识别出多达24个可能被忽视的额外独特样本。我们的分类模型在10折交叉验证的基础上获得了100%的准确率。特定类别命名实体识别模型的精确率、召回率和F1值显示出良好的性能。

结论

通过自然语言处理和机器学习的结合,我们设计了一种可重新实施的自动化方法,该方法可以从半结构化病理记录中准确提取标本属性,以动态填充肿瘤登记册。

相似文献

5
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.
10
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

引用本文的文献

2
Clinical Applications of Machine Learning.机器学习的临床应用
Ann Surg Open. 2024 Apr 18;5(2):e423. doi: 10.1097/AS9.0000000000000423. eCollection 2024 Jun.
6
Artificial intelligence and machine learning in cancer imaging.癌症成像中的人工智能与机器学习
Commun Med (Lond). 2022 Oct 27;2:133. doi: 10.1038/s43856-022-00199-0. eCollection 2022.

本文引用的文献

3
Using machine learning to parse breast pathology reports.使用机器学习解析乳腺病理报告。
Breast Cancer Res Treat. 2017 Jan;161(2):203-211. doi: 10.1007/s10549-016-4035-1. Epub 2016 Nov 8.
6
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.
7
The MITRE Identification Scrubber Toolkit: design, training, and assessment.MITRE 识别清理工具包:设计、培训和评估。
Int J Med Inform. 2010 Dec;79(12):849-59. doi: 10.1016/j.ijmedinf.2010.09.007. Epub 2010 Oct 14.
9
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验