• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大规模半自动化标注常规自由文本临床记录用于深度学习。

Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning.

机构信息

Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA.

Institute for Computational Health Sciences, University of California, San Francisco, CA, USA.

出版信息

J Digit Imaging. 2019 Feb;32(1):30-37. doi: 10.1007/s10278-018-0105-8.

DOI:10.1007/s10278-018-0105-8
PMID:30128778
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6382632/
Abstract

Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.

摘要

乳腺癌是美国女性癌症死亡的主要原因。乳房 X 光筛查在降低死亡率方面非常有效,但也存在很高的不必要召回和活检率。虽然深度学习可以应用于乳房 X 光检查,但需要大规模的标记数据集,而这些数据集很难获得。我们旨在通过使用结合传统自然语言处理 (NLP) 和 IBM Watson 的混合框架,从现有临床记录中自动提取数据,从而消除数据集开发的许多障碍。一位专家评审员手动标记了 3521 份乳腺病理学报告,其中有四种结果:左阳性、右阳性、双侧阳性、阴性。使用七种不同的机器学习分类器比较了传统 NLP 技术和 IBM Watson 的自动化自然语言分类器。使用精度、召回率和 F 值评估技术。逻辑回归优于所有其他传统机器学习分类器,并用于随后的比较。传统 NLP 和 Watson 的 NLC 在字符数少于 1024 的情况下表现良好,所有类别的加权平均 F 值均超过 0.96。对于字符数超过 1024 的病例,传统 NLP 的性能较低,F 值为 0.83。我们展示了一种使用传统 NLP 技术与 IBM Watson 相结合的混合框架,用于注释超过 10000 份乳腺病理学报告,以开发大规模数据库,用于乳房 X 光检查中的深度学习。我们的工作表明,传统 NLP 和 IBM Watson 在字符数少于 1024 的情况下表现非常出色,并且可以加快数据注释的速度。

相似文献

1
Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning.大规模半自动化标注常规自由文本临床记录用于深度学习。
J Digit Imaging. 2019 Feb;32(1):30-37. doi: 10.1007/s10278-018-0105-8.
2
Comprehensive Word-Level Classification of Screening Mammography Reports Using a Neural Network Sequence Labeling Approach.基于神经网络序列标注方法的乳腺 X 线摄影筛查报告的全面词级分类。
J Digit Imaging. 2019 Oct;32(5):685-692. doi: 10.1007/s10278-018-0141-4.
3
Automatic Determination of the Need for Intravenous Contrast in Musculoskeletal MRI Examinations Using IBM Watson's Natural Language Processing Algorithm.使用 IBM Watson 的自然语言处理算法自动确定肌肉骨骼 MRI 检查中是否需要静脉造影。
J Digit Imaging. 2018 Apr;31(2):245-251. doi: 10.1007/s10278-017-0021-3.
4
Deep-Learning-Based Semantic Labeling for 2D Mammography and Comparison of Complexity for Machine Learning Tasks.基于深度学习的 2D 乳腺 X 光图像语义标注及机器学习任务复杂度比较。
J Digit Imaging. 2019 Aug;32(4):565-570. doi: 10.1007/s10278-019-00244-w.
5
Predicting Breast Cancer by Applying Deep Learning to Linked Health Records and Mammograms.应用深度学习对健康记录和乳腺 X 光照片进行联合分析,以预测乳腺癌。
Radiology. 2019 Aug;292(2):331-342. doi: 10.1148/radiol.2019182622. Epub 2019 Jun 18.
6
Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches.使用深度学习方法从中文乳腺超声报告中提取 BI-RADS 结果。
Int J Med Inform. 2018 Nov;119:17-21. doi: 10.1016/j.ijmedinf.2018.08.009. Epub 2018 Aug 18.
7
A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction.基于深度学习的乳腺 X 线摄影模型提高乳腺癌风险预测。
Radiology. 2019 Jul;292(1):60-66. doi: 10.1148/radiol.2019182716. Epub 2019 May 7.
8
Automated outcome classification of emergency department computed tomography imaging reports.急诊 CT 影像报告的自动化结果分类。
Acad Emerg Med. 2013 Aug;20(8):848-54. doi: 10.1111/acem.12174.
9
Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm.基于局部保持投影算法的机器学习方法预测乳腺癌风险。
Phys Med Biol. 2018 Jan 30;63(3):035020. doi: 10.1088/1361-6560/aaa1ca.
10
Machine learning to parse breast pathology reports in Chinese.基于机器学习的中文乳腺病理报告解析
Breast Cancer Res Treat. 2018 Jun;169(2):243-250. doi: 10.1007/s10549-018-4668-3. Epub 2018 Jan 29.

引用本文的文献

1
A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.基于大语言模型的零样本推理与乳腺癌病理报告任务特定监督分类的比较研究。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2315-2327. doi: 10.1093/jamia/ocae146.
2
Development of a novel drug information provision system for Kampo medicine using natural language processing technology.利用自然语言处理技术开发一种新型的汉方药药物信息提供系统。
BMC Med Inform Decis Mak. 2023 Jul 13;23(1):119. doi: 10.1186/s12911-023-02230-3.
3
Reducing the number of unnecessary biopsies for mammographic BI-RADS 4 lesions through a deep transfer learning method.通过深度迁移学习方法减少乳腺 BI-RADS 4 病变的不必要活检数量。
BMC Med Imaging. 2023 Jun 13;23(1):82. doi: 10.1186/s12880-023-01023-4.
4
The Use and Structure of Emergency Nurses' Triage Narrative Data: Scoping Review.急诊护士分诊叙事数据的使用与结构:范围综述
JMIR Nurs. 2023 Jan 13;6:e41331. doi: 10.2196/41331.
5
Neural Network Assisted Pathology Case Identification.神经网络辅助病理病例识别。
J Pathol Inform. 2022 Jan 20;13:100008. doi: 10.1016/j.jpi.2022.100008. eCollection 2022.
6
Empowering study of breast cancer data with application of artificial intelligence technology: promises, challenges, and use cases.应用人工智能技术赋能乳腺癌数据研究:前景、挑战及应用实例。
Clin Exp Metastasis. 2022 Feb;39(1):249-254. doi: 10.1007/s10585-021-10125-8. Epub 2021 Oct 26.
7
Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case.利用临床记录对以患者为中心的结局的严重程度进行表型分析:一个前列腺癌的应用案例。
Learn Health Syst. 2020 Jul 17;4(4):e10237. doi: 10.1002/lrh2.10237. eCollection 2020 Oct.
8
Generalization error analysis for deep convolutional neural network with transfer learning in breast cancer diagnosis.基于迁移学习的深度卷积神经网络在乳腺癌诊断中的泛化误差分析。
Phys Med Biol. 2020 May 11;65(10):105002. doi: 10.1088/1361-6560/ab82e8.
9
Machine and deep learning approaches for cancer drug repurposing.机器和深度学习方法在癌症药物再利用中的应用。
Semin Cancer Biol. 2021 Jan;68:132-142. doi: 10.1016/j.semcancer.2019.12.011. Epub 2020 Jan 3.

本文引用的文献

1
A context-sensitive deep learning approach for microcalcification detection in mammograms.一种用于乳腺钼靶片中微钙化检测的上下文敏感深度学习方法。
Pattern Recognit. 2018 Jun;78:12-22. doi: 10.1016/j.patcog.2018.01.009. Epub 2018 Jan 10.
2
Automatic Determination of the Need for Intravenous Contrast in Musculoskeletal MRI Examinations Using IBM Watson's Natural Language Processing Algorithm.使用 IBM Watson 的自然语言处理算法自动确定肌肉骨骼 MRI 检查中是否需要静脉造影。
J Digit Imaging. 2018 Apr;31(2):245-251. doi: 10.1007/s10278-017-0021-3.
3
A deep learning approach for the analysis of masses in mammograms with minimal user intervention.一种深度学习方法,用于在乳腺 X 光片中分析肿块,用户只需进行最小干预。
Med Image Anal. 2017 Apr;37:114-128. doi: 10.1016/j.media.2017.01.009. Epub 2017 Jan 28.
4
Dermatologist-level classification of skin cancer with deep neural networks.基于深度神经网络的皮肤癌皮肤科医生级分类。
Nature. 2017 Feb 2;542(7639):115-118. doi: 10.1038/nature21056. Epub 2017 Jan 25.
5
National Performance Benchmarks for Modern Screening Digital Mammography: Update from the Breast Cancer Surveillance Consortium.现代筛查数字化乳腺摄影的国家性能基准:来自乳腺癌监测联盟的更新
Radiology. 2017 Apr;283(1):49-58. doi: 10.1148/radiol.2016161174. Epub 2016 Dec 5.
6
Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods.在临床决策支持中使用自然语言处理和数据挖掘方法关联乳腺钼靶检查和病理检查结果。
Cancer. 2017 Jan 1;123(1):114-121. doi: 10.1002/cncr.30245. Epub 2016 Aug 29.
7
Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis.深度学习作为提高组织病理学诊断准确性和效率的工具。
Sci Rep. 2016 May 23;6:26286. doi: 10.1038/srep26286.
8
Natural Language Processing in Radiology: A Systematic Review.自然语言处理在放射学中的应用:系统评价。
Radiology. 2016 May;279(2):329-43. doi: 10.1148/radiol.16142770.
9
Impact of screening mammography on breast cancer mortality.乳腺钼靶筛查对乳腺癌死亡率的影响。
Int J Cancer. 2016 Apr 15;138(8):2003-12. doi: 10.1002/ijc.29925. Epub 2015 Dec 15.
10
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.