基于大型临床真实世界数据集的二次利用实现多标签文本分类。

Multi-label text classification via secondary use of large clinical real-world data sets.

机构信息

Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Billrothgasse 18a, 8010, Graz, Austria.

Institute of Neural Engineering, Graz University of Technology, Stremayrgasse 16/IV, 8010, Graz, Austria.

出版信息

Sci Rep. 2024 Nov 6;14(1):26972. doi: 10.1038/s41598-024-76424-8.

DOI:10.1038/s41598-024-76424-8

PMID:39505974

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11541716/

Abstract

Procedural coding presents a taxing challenge for clinicians. However, recent advances in natural language processing offer a promising avenue for developing applications that assist clinicians, thereby alleviating their administrative burdens. This study seeks to create an application capable of predicting procedure codes by analysing clinicians' operative notes, aiming to streamline their workflow and enhance efficiency. We downstreamed an existing and a native German medical BERT model in a secondary use scenario, utilizing already coded surgery notes to model the coding procedure as a multi-label classification task. In comparison to the transformer-based architecture, we were levering the non-contextual model fastText, a convolutional neural network, a support vector machine and logistic regression for a comparative analysis of possible coding performance. About 350,000 notes were used for model adaption. By considering the top five suggested procedure codes from medBERT.de, surgeryBERT.at, fastText, a convolutional neural network, a support vector machine and a logistic regression, the mean average precision achieved was 0.880, 0.867, 0.870, 0.851, 0.870 and 0.805 respectively. Support vector machines performed better for surgery reports with a sequence length greater than 512, achieving a mean average precision of 0.872 in comparison to 0.840 for fastText, 0.837 for medBERT.de and 0.820 for surgeryBERT.at. A prototypical front-end application for coding support was additionally implemented. The problem of predicting procedure codes from a given operative report can be successfully modelled as a multi-label classification task, with a promising performance. Support vector machines as a classical machine learning method outperformed the non-contextual fastText approach. FastText with less demanding hardware resources has reached a similar performance to BERT-based models and has shown to be more suitable for explaining the predictions efficiently.

摘要

程序编码对临床医生来说是一项艰巨的挑战。然而，自然语言处理的最新进展为开发协助临床医生的应用程序提供了一个有前途的途径，从而减轻他们的行政负担。本研究旨在创建一个能够通过分析临床医生的手术记录来预测程序代码的应用程序，旨在简化他们的工作流程并提高效率。我们在二次使用场景中对现有的和原生德语医学 BERT 模型进行了下游处理，利用已经编码的手术记录来模拟编码过程作为多标签分类任务。与基于转换器的架构相比，我们利用了非上下文模型 fastText、卷积神经网络、支持向量机和逻辑回归来进行可能的编码性能的比较分析。大约 35 万条记录用于模型适配。考虑到 medBERT.de、surgeryBERT.at、fastText、卷积神经网络、支持向量机和逻辑回归中建议的前五个程序代码，平均精度分别为 0.880、0.867、0.870、0.851、0.870 和 0.805。对于序列长度大于 512 的手术报告，支持向量机的性能更好，平均精度为 0.872，而 fastText 为 0.840，medBERT.de 为 0.837，surgeryBERT.at 为 0.820。此外，还实现了一个用于编码支持的原型前端应用程序。从给定手术报告预测程序代码的问题可以成功地建模为多标签分类任务，具有有前途的性能。作为一种经典的机器学习方法，支持向量机的性能优于非上下文 fastText 方法。具有较低硬件资源要求的 fastText 达到了与基于 BERT 的模型相似的性能，并已被证明更适合有效地解释预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6153/11541716/88e55e9fe9b2/41598_2024_76424_Fig1_HTML.jpg

相似文献

Multi-label text classification via secondary use of large clinical real-world data sets.基于大型临床真实世界数据集的二次利用实现多标签文本分类。

Sci Rep. 2024 Nov 6;14(1):26972. doi: 10.1038/s41598-024-76424-8.

Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.使用预训练语言模型和先进提示学习技术的自主国际疾病分类编码：对一个使用医学文本的自动分析系统的评估

JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.

Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究

Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.使用分层标签分类注意力网络和标签嵌入初始化来实现临床笔记的可解释自动化编码。

J Biomed Inform. 2021 Apr;116:103728. doi: 10.1016/j.jbi.2021.103728. Epub 2021 Mar 9.

Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.监督式文本分类系统在电子病历中的 Fontan 患者检测准确率高于编码。

J Am Heart Assoc. 2023 Jul 4;12(13):e030046. doi: 10.1161/JAHA.123.030046. Epub 2023 Jun 22.

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。

J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

Comparing neural language models for medical concept representation and patient trajectory prediction.比较用于医学概念表示和患者轨迹预测的神经语言模型。

Artif Intell Med. 2025 May;163:103108. doi: 10.1016/j.artmed.2025.103108. Epub 2025 Mar 10.

Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity.利用上下文嵌入和标签粒度增强 ICD 多标签健康记录分类。

Comput Methods Programs Biomed. 2020 May;188:105264. doi: 10.1016/j.cmpb.2019.105264. Epub 2019 Dec 10.

Classification of Current Procedural Terminology Codes from Electronic Health Record Data Using Machine Learning.使用机器学习对电子健康记录数据中的当前操作术语代码进行分类。

Anesthesiology. 2020 Apr;132(4):738-749. doi: 10.1097/ALN.0000000000003150.

Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.利用可解释 AI 方法识别患者的吸烟状况：丹麦电子健康记录案例研究。

BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.

本文引用的文献

Automated clinical coding: what, why, and where we are?自动化临床编码：是什么、为什么以及我们目前的进展？

NPJ Digit Med. 2022 Oct 22;5(1):159. doi: 10.1038/s41746-022-00705-7.

Hierarchical Attention Neural Network for Event Types to Improve Event Detection.层次注意力神经网络用于事件类型，以提高事件检测。

Sensors (Basel). 2022 May 31;22(11):4202. doi: 10.3390/s22114202.

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.使用多滤波器残差卷积神经网络从临床文本中进行ICD编码

Proc AAAI Conf Artif Intell. 2020 Feb;34(5):8180-8187. doi: 10.1609/aaai.v34i05.6331. Epub 2020 Apr 3.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT：基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

An extensive review of tools for manual annotation of documents.对文档手动标注工具的全面回顾。

Brief Bioinform. 2021 Jan 18;22(1):146-163. doi: 10.1093/bib/bbz130.

Deep learning in clinical natural language processing: a methodical review.深度学习在临床自然语言处理中的应用：系统综述。

J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.

Association of Electronic Health Record Design and Use Factors With Clinician Stress and Burnout.电子健康记录设计和使用因素与临床医生压力和倦怠的关联。

JAMA Netw Open. 2019 Aug 2;2(8):e199609. doi: 10.1001/jamanetworkopen.2019.9609.

Assigning clinical codes with data-driven concept representation on Dutch clinical free text.基于数据驱动的概念表示为荷兰语临床自由文本分配临床编码。

J Biomed Inform. 2017 May;69:118-127. doi: 10.1016/j.jbi.2017.04.007. Epub 2017 Apr 8.

Detection of sentence boundaries and abbreviations in clinical narratives.临床叙述中句子边界和缩写的检测。

BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S4. doi: 10.1186/1472-6947-15-S2-S4. Epub 2015 Jun 15.

A systematic literature review of automated clinical coding and classification systems.自动化临床编码和分类系统的系统文献回顾。

J Am Med Inform Assoc. 2010 Nov-Dec;17(6):646-51. doi: 10.1136/jamia.2009.001024.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于大型临床真实世界数据集的二次利用实现多标签文本分类。

Multi-label text classification via secondary use of large clinical real-world data sets.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献