自然语言处理方法在自由文本细胞学报告中对乳腺病变进行多层次多分类的自动分类。

Natural Language Processing Approaches for Automated Multilevel and Multiclass Classification of Breast Lesions on Free-Text Cytopathology Reports.

机构信息

Department of Computer Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India.

Department of Information Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India.

出版信息

JCO Clin Cancer Inform. 2022 Sep;6:e2200036. doi: 10.1200/CCI.22.00036.

DOI:10.1200/CCI.22.00036

PMID:36103641

Abstract

PURPOSE

The extensive growth and use of electronic health records (EHRs) and extending medical literature have led to huge opportunities to automate the extraction of relevant clinical information that helps in concise and effective clinical decision support. However, processing such information has traditionally been dependent on labor-intensive processes with human errors such as fatigue, oversight, and interobserver variability. Hence, this study aims at the processing of EHRs and performing multilevel and multiclass classification by fetching dominant characteristic features that are sufficient to detect and differentiate various types of breast lesions.

PATIENTS AND METHODS

In this study, unstructured EHRs on breast lesions obtained through fine-needle aspiration cytology technique are considered. The raw text was normalized into structured tabular form and converted to scores by performing sentiment analysis that helps to decide the total polarity or class label of the EHR. Supervised machine learning approaches, namely random forest and feed-forward neural network trained using Levenberg-Marquardt training function, are used for classification of the collected EHR data set containing 2,879 records that are split in the ratio of 80:20 as training and testing data sets, respectively.

RESULTS

Random forest and feed-forward neural network classifiers gave the best performance with an accuracy of 99.36%, an overall receiver operating characteristic-area under the curve of 99.2%, a correlation with ground truth of 98.3%, and a histopathologic correlation of 98.6%.

CONCLUSION

Natural language processing has huge potential to automate the extraction of clinical features from breast lesions. The proposed multilevel and multiclass classification approach is used to classify 13 different types of breast lesions with 20 different labels into five classes to decide the type of treatment that should be given to patients by a physician or oncologist.

摘要

目的

电子健康记录（EHR）的广泛发展和使用以及医学文献的扩展为自动提取有助于简明有效的临床决策支持的相关临床信息提供了巨大的机会。然而，传统上处理此类信息一直依赖于劳动密集型过程，存在人为错误，例如疲劳、疏忽和观察者间变异性。因此，本研究旨在处理 EHR 并通过提取足以检测和区分各种类型的乳腺病变的主要特征来进行多层次和多类分类。

患者和方法

在这项研究中，考虑了通过细针抽吸细胞学技术获得的乳腺病变的非结构化 EHR。原始文本被规范化为结构化表格形式，并通过执行情感分析将其转换为分数，这有助于确定 EHR 的总极性或类别标签。使用监督机器学习方法，即使用 Levenberg-Marquardt 训练函数训练的随机森林和前馈神经网络，对包含 2879 条记录的采集 EHR 数据集进行分类，这些数据集分别以 80:20 的比例分为训练集和测试集。

结果

随机森林和前馈神经网络分类器的性能最佳，准确率为 99.36%，整体接收器工作特征曲线下面积为 99.2%，与真实情况的相关性为 98.3%，与组织病理学的相关性为 98.6%。

结论

自然语言处理具有从乳腺病变中自动提取临床特征的巨大潜力。所提出的多层次多类分类方法用于将 20 个不同标签的 13 种不同类型的乳腺病变分为五类，以决定医生或肿瘤学家应给予患者的治疗类型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

自然语言处理方法在自由文本细胞学报告中对乳腺病变进行多层次多分类的自动分类。

Natural Language Processing Approaches for Automated Multilevel and Multiclass Classification of Breast Lesions on Free-Text Cytopathology Reports.

机构信息

出版信息

PURPOSE

PATIENTS AND METHODS

RESULTS

CONCLUSION

目的

患者和方法

结果

结论

相似文献

引用本文的文献

自然语言处理方法在自由文本细胞学报告中对乳腺病变进行多层次多分类的自动分类。

Natural Language Processing Approaches for Automated Multilevel and Multiclass Classification of Breast Lesions on Free-Text Cytopathology Reports.

机构信息

出版信息

PURPOSE

PATIENTS AND METHODS

RESULTS

CONCLUSION

目的

患者和方法

结果

结论

相似文献

引用本文的文献