• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于在具有不确定性估计的有限数据环境中进行病理学解析的自然语言处理系统。

Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation.

作者信息

Odisho Anobel Y, Park Briton, Altieri Nicholas, DeNero John, Cooperberg Matthew R, Carroll Peter R, Yu Bin

机构信息

Department of Urology, UCSF Helen Diller Family Comprehensive Cancer Center, San Francisco, California, USA.

Department of Statistics, University of California, Berkeley, California, USA.

出版信息

JAMIA Open. 2020 Oct 14;3(3):431-438. doi: 10.1093/jamiaopen/ooaa029. eCollection 2020 Oct.

DOI:10.1093/jamiaopen/ooaa029
PMID:33381748
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7751177/
Abstract

OBJECTIVE

Cancer is a leading cause of death, but much of the diagnostic information is stored as unstructured data in pathology reports. We aim to improve uncertainty estimates of machine learning-based pathology parsers and evaluate performance in low data settings.

MATERIALS AND METHODS

Our data comes from the Urologic Outcomes Database at UCSF which includes 3232 annotated prostate cancer pathology reports from 2001 to 2018. We approach 17 separate information extraction tasks, involving a wide range of pathologic features. To handle the diverse range of fields, we required 2 statistical models, a document classification method for pathologic features with a small set of possible values and a token extraction method for pathologic features with a large set of values. For each model, we used isotonic calibration to improve the model's estimates of its likelihood of being correct.

RESULTS

Our best document classifier method, a convolutional neural network, achieves a weighted F1 score of 0.97 averaged over 12 fields and our best extraction method achieves an accuracy of 0.93 averaged over 5 fields. The performance saturates as a function of dataset size with as few as 128 data points. Furthermore, while our document classifier methods have reliable uncertainty estimates, our extraction-based methods do not, but after isotonic calibration, expected calibration error drops to below 0.03 for all extraction fields.

CONCLUSIONS

We find that when applying machine learning to pathology parsing, large datasets may not always be needed, and that calibration methods can improve the reliability of uncertainty estimates.

摘要

目的

癌症是主要的死亡原因之一,但许多诊断信息以非结构化数据的形式存储在病理报告中。我们旨在改进基于机器学习的病理解析器的不确定性估计,并评估在低数据设置下的性能。

材料与方法

我们的数据来自加州大学旧金山分校的泌尿外科结果数据库,其中包括2001年至2018年的3232份带注释的前列腺癌病理报告。我们处理17个不同的信息提取任务,涉及广泛的病理特征。为了处理各种不同的字段,我们需要2种统计模型,一种用于具有少量可能值的病理特征的文档分类方法,以及一种用于具有大量值的病理特征的令牌提取方法。对于每个模型,我们使用等渗校准来改进模型对其正确可能性的估计。

结果

我们最好的文档分类器方法,即卷积神经网络,在12个字段上平均加权F1分数达到0.97,我们最好的提取方法在5个字段上平均准确率达到0.93。性能随着数据集大小的增加而饱和,数据点少至128个时也如此。此外,虽然我们的文档分类器方法具有可靠的不确定性估计,但基于提取的方法却没有,不过经过等渗校准后,所有提取字段的预期校准误差降至0.03以下。

结论

我们发现,将机器学习应用于病理解析时,可能并不总是需要大型数据集,并且校准方法可以提高不确定性估计的可靠性。

相似文献

1
Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation.用于在具有不确定性估计的有限数据环境中进行病理学解析的自然语言处理系统。
JAMIA Open. 2020 Oct 14;3(3):431-438. doi: 10.1093/jamiaopen/ooaa029. eCollection 2020 Oct.
2
Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。
J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.
3
InsightSleepNet: the interpretable and uncertainty-aware deep learning network for sleep staging using continuous Photoplethysmography.InsightSleepNet:一种可解释且具有不确定性感知能力的深度学习网络,用于使用连续光体积描记法进行睡眠分期。
BMC Med Inform Decis Mak. 2024 Feb 14;24(1):50. doi: 10.1186/s12911-024-02437-y.
4
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。
J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.
5
Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity.使用迁移学习和零样本字符串相似度改进从癌症病理报告中提取自然语言信息。
JAMIA Open. 2021 Sep 30;4(3):ooab085. doi: 10.1093/jamiaopen/ooab085. eCollection 2021 Jul.
6
Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.在两家大型学术放射科实践中膝关节MRI报告的机器学习分类器性能:一种估计诊断率的工具
AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.
7
Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.基于机器学习的自然语言处理方法对临床笔记进行医学子域分类。
BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类:一种基于规则的方法。
J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.
10
Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.用于信息提取降噪的手术病理报告的机器学习分类及语块识别
Artif Intell Med. 2016 Jun;70:77-83. doi: 10.1016/j.artmed.2016.06.001. Epub 2016 Jun 8.

引用本文的文献

1
Performance of Natural Language Processing for Information Extraction From Electronic Health Records Within Cancer: Systematic Review.自然语言处理在癌症电子健康记录信息提取中的性能:系统评价
JMIR Med Inform. 2025 Sep 12;13:e68707. doi: 10.2196/68707.
2
Large-scale deep learning for metastasis detection in pathology reports.用于病理报告中转移检测的大规模深度学习
JAMIA Open. 2025 Jul 11;8(4):ooaf070. doi: 10.1093/jamiaopen/ooaf070. eCollection 2025 Aug.
3
ImpACT Project: Improving Access to Clinical Trials in Victoria, an Artificial Intelligence-Based Approach.ImpACT项目:采用基于人工智能的方法改善维多利亚州的临床试验可及性。
JCO Clin Cancer Inform. 2025 Jan;9:e2400137. doi: 10.1200/CCI.24.00137. Epub 2025 Jan 9.
4
A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.基于大语言模型的零样本推理与乳腺癌病理报告任务特定监督分类的比较研究。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2315-2327. doi: 10.1093/jamia/ocae146.
5
Computational pathology: A survey review and the way forward.计算病理学:综述与未来发展方向
J Pathol Inform. 2024 Jan 14;15:100357. doi: 10.1016/j.jpi.2023.100357. eCollection 2024 Dec.
6
An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports.一种用于从病理报告中提取诊断数据的便捷、高效且准确的自然语言处理方法。
J Pathol Inform. 2022 Nov 8;13:100154. doi: 10.1016/j.jpi.2022.100154. eCollection 2022.
7
Natural Language Processing: from Bedside to Everywhere.自然语言处理:从床边到无处不在。
Yearb Med Inform. 2022 Aug;31(1):243-253. doi: 10.1055/s-0042-1742510. Epub 2022 Jun 2.
8
Automatic Classification of Cancer Pathology Reports: A Systematic Review.癌症病理报告的自动分类:一项系统综述。
J Pathol Inform. 2022 Jan 20;13:100003. doi: 10.1016/j.jpi.2022.100003. eCollection 2022.
9
Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology.通过 ARGO(肿瘤血液病自动记录生成器)从病理报告中生成电子病例报告表
Sci Rep. 2021 Dec 10;11(1):23823. doi: 10.1038/s41598-021-03204-z.
10
Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity.使用迁移学习和零样本字符串相似度改进从癌症病理报告中提取自然语言信息。
JAMIA Open. 2021 Sep 30;4(3):ooab085. doi: 10.1093/jamiaopen/ooab085. eCollection 2021 Jul.

本文引用的文献

1
Cancer statistics, 2020.癌症统计数据,2020 年。
CA Cancer J Clin. 2020 Jan;70(1):7-30. doi: 10.3322/caac.21590. Epub 2020 Jan 8.
2
Applying a deep learning-based sequence labeling approach to detect attributes of medical concepts in clinical text.应用基于深度学习的序列标注方法来检测临床文本中医疗概念的属性。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):236. doi: 10.1186/s12911-019-0937-2.
3
Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics.通过采用分类、命名实体识别和关系提取启发式方法的自然语言处理途径从病理报告中获取知识。
JCO Clin Cancer Inform. 2019 Aug;3:1-8. doi: 10.1200/CCI.19.00008.
4
Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.为前列腺癌临床护理与研究自动采集结构化病理数据
JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.
5
A Frame-Based NLP System for Cancer-Related Information Extraction.一种用于癌症相关信息提取的基于框架的自然语言处理系统。
AMIA Annu Symp Proc. 2018 Dec 5;2018:1524-1533. eCollection 2018.
6
Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.使用自然语言处理技术从膀胱肿瘤经尿道切除术病理报告中自动提取分级、分期和质量信息
JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00128.
7
Hierarchical attention networks for information extraction from cancer pathology reports.用于从癌症病理报告中提取信息的分层注意力网络。
J Am Med Inform Assoc. 2018 Mar 1;25(3):321-330. doi: 10.1093/jamia/ocx131.
8
Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.开发用于健康服务研究的自然语言处理引擎以生成膀胱癌病理数据。
Urology. 2017 Dec;110:84-91. doi: 10.1016/j.urology.2017.07.056. Epub 2017 Sep 12.
9
Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.用于捕获和标准化非结构化临床信息的自然语言处理系统:一项系统综述。
J Biomed Inform. 2017 Sep;73:14-29. doi: 10.1016/j.jbi.2017.07.012. Epub 2017 Jul 17.
10
Assessment of Automating Safety Surveillance From Electronic Health Records: Analysis for the Quality and Safety Review System.从电子健康记录中自动化安全监测的评估:质量和安全审查系统分析。
J Patient Saf. 2021 Sep 1;17(6):e524-e528. doi: 10.1097/PTS.0000000000000402.