Suppr超能文献

放射学自由文本数据中的先进采样技术,用于通过深度学习在椎体骨折中高效构建文本挖掘模型。

Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture.

作者信息

Hung Wei-Chieh, Lin Yih-Lon, Lin Chi-Wei, Chin Wei-Leng, Wu Chih-Hsing

机构信息

Department of Family and Community Medicine, E-Da Hospital, I-Shou University, Kaohsiung 82445, Taiwan.

School of Medicine, I-Shou University, Kaohsiung 84001, Taiwan.

出版信息

Diagnostics (Basel). 2024 Jan 8;14(2):137. doi: 10.3390/diagnostics14020137.

Abstract

This study aims to establish advanced sampling methods in free-text data for efficiently building semantic text mining models using deep learning, such as identifying vertebral compression fracture (VCF) in radiology reports. We enrolled a total of 27,401 radiology free-text reports of X-ray examinations of the spine. The predictive effects were compared between text mining models built using supervised long short-term memory networks, independently derived by four sampling methods: vector sum minimization, vector sum maximization, stratified, and simple random sampling, using four fixed percentages. The drawn samples were applied to the training set, and the remaining samples were used to validate each group using different sampling methods and ratios. The predictive accuracy was measured using the area under the receiver operating characteristics (AUROC) to identify VCF. At the sampling ratios of 1/10, 1/20, 1/30, and 1/40, the highest AUROC was revealed in the sampling methods of vector sum minimization as confidence intervals of 0.981 (95%CIs: 0.980-0.983)/0.963 (95%CIs: 0.961-0.965)/0.907 (95%CIs: 0.904-0.911)/0.895 (95%CIs: 0.891-0.899), respectively. The lowest AUROC was demonstrated in the vector sum maximization. This study proposes an advanced sampling method, vector sum minimization, in free-text data that can be efficiently applied to build the text mining models by smartly drawing a small amount of critical representative samples.

摘要

本研究旨在建立自由文本数据中的先进采样方法,以便使用深度学习高效构建语义文本挖掘模型,例如在放射学报告中识别椎体压缩性骨折(VCF)。我们纳入了总共27401份脊柱X线检查的放射学自由文本报告。使用四种固定百分比,通过四种采样方法独立推导,比较了使用监督长短期记忆网络构建的文本挖掘模型之间的预测效果:向量和最小化、向量和最大化、分层抽样和简单随机抽样。抽取的样本应用于训练集,其余样本用于使用不同的采样方法和比例验证每组。使用受试者工作特征曲线下面积(AUROC)测量预测准确性以识别VCF。在1/10、1/20、1/30和1/40的采样率下,向量和最小化采样方法的AUROC最高,置信区间分别为0.981(95%CI:0.980 - 0.983)/0.963(95%CI:0.961 - 0.965)/0.907(95%CI:0.904 - 0.911)/0.895(95%CI:0.891 - 0.899)。向量和最大化的AUROC最低。本研究提出了一种自由文本数据中的先进采样方法——向量和最小化,通过巧妙抽取少量关键代表性样本,可有效应用于构建文本挖掘模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2123/10814913/0552a9753070/diagnostics-14-00137-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验