• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自然语言处理方法在自由文本细胞学报告中对乳腺病变进行多层次多分类的自动分类。

Natural Language Processing Approaches for Automated Multilevel and Multiclass Classification of Breast Lesions on Free-Text Cytopathology Reports.

机构信息

Department of Computer Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India.

Department of Information Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India.

出版信息

JCO Clin Cancer Inform. 2022 Sep;6:e2200036. doi: 10.1200/CCI.22.00036.

DOI:10.1200/CCI.22.00036
PMID:36103641
Abstract

PURPOSE

The extensive growth and use of electronic health records (EHRs) and extending medical literature have led to huge opportunities to automate the extraction of relevant clinical information that helps in concise and effective clinical decision support. However, processing such information has traditionally been dependent on labor-intensive processes with human errors such as fatigue, oversight, and interobserver variability. Hence, this study aims at the processing of EHRs and performing multilevel and multiclass classification by fetching dominant characteristic features that are sufficient to detect and differentiate various types of breast lesions.

PATIENTS AND METHODS

In this study, unstructured EHRs on breast lesions obtained through fine-needle aspiration cytology technique are considered. The raw text was normalized into structured tabular form and converted to scores by performing sentiment analysis that helps to decide the total polarity or class label of the EHR. Supervised machine learning approaches, namely random forest and feed-forward neural network trained using Levenberg-Marquardt training function, are used for classification of the collected EHR data set containing 2,879 records that are split in the ratio of 80:20 as training and testing data sets, respectively.

RESULTS

Random forest and feed-forward neural network classifiers gave the best performance with an accuracy of 99.36%, an overall receiver operating characteristic-area under the curve of 99.2%, a correlation with ground truth of 98.3%, and a histopathologic correlation of 98.6%.

CONCLUSION

Natural language processing has huge potential to automate the extraction of clinical features from breast lesions. The proposed multilevel and multiclass classification approach is used to classify 13 different types of breast lesions with 20 different labels into five classes to decide the type of treatment that should be given to patients by a physician or oncologist.

摘要

目的

电子健康记录(EHR)的广泛发展和使用以及医学文献的扩展为自动提取有助于简明有效的临床决策支持的相关临床信息提供了巨大的机会。然而,传统上处理此类信息一直依赖于劳动密集型过程,存在人为错误,例如疲劳、疏忽和观察者间变异性。因此,本研究旨在处理 EHR 并通过提取足以检测和区分各种类型的乳腺病变的主要特征来进行多层次和多类分类。

患者和方法

在这项研究中,考虑了通过细针抽吸细胞学技术获得的乳腺病变的非结构化 EHR。原始文本被规范化为结构化表格形式,并通过执行情感分析将其转换为分数,这有助于确定 EHR 的总极性或类别标签。使用监督机器学习方法,即使用 Levenberg-Marquardt 训练函数训练的随机森林和前馈神经网络,对包含 2879 条记录的采集 EHR 数据集进行分类,这些数据集分别以 80:20 的比例分为训练集和测试集。

结果

随机森林和前馈神经网络分类器的性能最佳,准确率为 99.36%,整体接收器工作特征曲线下面积为 99.2%,与真实情况的相关性为 98.3%,与组织病理学的相关性为 98.6%。

结论

自然语言处理具有从乳腺病变中自动提取临床特征的巨大潜力。所提出的多层次多类分类方法用于将 20 个不同标签的 13 种不同类型的乳腺病变分为五类,以决定医生或肿瘤学家应给予患者的治疗类型。

相似文献

1
Natural Language Processing Approaches for Automated Multilevel and Multiclass Classification of Breast Lesions on Free-Text Cytopathology Reports.自然语言处理方法在自由文本细胞学报告中对乳腺病变进行多层次多分类的自动分类。
JCO Clin Cancer Inform. 2022 Sep;6:e2200036. doi: 10.1200/CCI.22.00036.
2
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
3
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.利用可解释 AI 方法识别患者的吸烟状况:丹麦电子健康记录案例研究。
BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.
4
Word2Vec inversion and traditional text classifiers for phenotyping lupus.用于狼疮表型分析的词向量反演和传统文本分类器
BMC Med Inform Decis Mak. 2017 Aug 22;17(1):126. doi: 10.1186/s12911-017-0518-1.
5
A nursing note-aware deep neural network for predicting mortality risk after hospital discharge.基于护理记录的深度学习神经网络预测出院后死亡率。
Int J Nurs Stud. 2024 Aug;156:104797. doi: 10.1016/j.ijnurstu.2024.104797. Epub 2024 May 9.
6
Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.用于对自由文本放射学报告中报告的脑转移瘤进行自动定量的自然语言处理
JCO Clin Cancer Inform. 2019 Apr;3:1-9. doi: 10.1200/CCI.18.00138.
7
Automation of penicillin adverse drug reaction categorisation and risk stratification with machine learning natural language processing.利用机器学习自然语言处理实现青霉素药物不良反应分类和风险分层的自动化。
Int J Med Inform. 2021 Dec;156:104611. doi: 10.1016/j.ijmedinf.2021.104611. Epub 2021 Oct 5.
8
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
9
Improving the performance of machine learning penicillin adverse drug reaction classification with synthetic data and transfer learning.利用合成数据和迁移学习提高机器学习青霉素不良反应分类的性能。
Intern Med J. 2024 Jul;54(7):1183-1189. doi: 10.1111/imj.16360. Epub 2024 Mar 14.
10
Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods.与传统方法相比,用于食品分类和营养质量预测的自然语言处理和机器学习方法。
Am J Clin Nutr. 2023 Mar;117(3):553-563. doi: 10.1016/j.ajcnut.2022.11.022. Epub 2022 Dec 23.

引用本文的文献

1
Gaps in Artificial Intelligence Research for Rural Health in the United States: A Scoping Review.美国农村卫生人工智能研究的差距:一项范围综述
medRxiv. 2025 Jun 27:2025.06.26.25330361. doi: 10.1101/2025.06.26.25330361.