Liu Wenjuan, Zhang Xi, Lv Han, Li Jia, Liu Yawen, Yang Zhenghan, Weng Xutao, Lin Yucong, Song Hong, Wang Zhenchang
Department of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, China.
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.
Front Oncol. 2022 Nov 21;12:913806. doi: 10.3389/fonc.2022.913806. eCollection 2022.
BACKGROUND: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. OBJECTIVE: The purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC). METHODS: Radiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies. RESULTS: The dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as "no abnormality", "suggest", "fatty liver", and "transfer" showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning. CONCLUSIONS: The learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.
背景:医学影像在临床实践中至关重要,高价值的放射学报告能够为临床医生提供积极帮助。然而,目前缺乏确定报告价值的方法。 目的:本研究旨在建立一种集成学习分类模型,利用自然语言处理(NLP)技术处理放射学报告的中文自由文本,以确定其对结直肠癌(CRC)患者肝脏病变检测的价值。 方法:根据CRC患者肝脏病变检测结果,将上腹部计算机断层扫描(CT)和磁共振成像(MRI)的放射学报告分为五类。对每个数据集应用包括分词、停用词去除和n-gram语言模型建立在内的NLP方法。然后,构建词袋模型,选择高频词作为特征,并构建集成学习分类模型。应用了几种机器学习方法,包括逻辑回归(LR)、随机森林(RF)等。我们比较了预先选择相关词串与我们的机器学习方法之间的准确性。 结果:2790例患者的数据集包括平扫CT(10.2%)、增强/平扫CT(73.3%)、平扫MRI(1.8%)和增强/平扫MRI(14.6%)。集成学习分类模型有效地确定了报告的价值,在增强/平扫CT数据集中使用XGBoost时达到了95.91%。逻辑回归、随机森林和支持向量机也取得了良好的分类准确率,分别达到95.89%、95.04%和95.00%。使用混淆矩阵对XGBoost的结果进行了可视化。I、II和V类中的错误数量非常少。使用ELI5为每个类别选择重要词汇。“无异常”“提示”“脂肪肝”和“转移”等词汇与分类准确率呈现出相对较大程度的正相关。基于字符串模式搜索方法模型的准确率低于机器学习方法。 结论:基于NLP的学习分类模型是确定聚焦肝脏病变的放射学报告价值的有效工具。该研究使得大规模分析医学影像检查的价值成为可能。
Comput Methods Programs Biomed. 2021-3
BMC Med Inform Decis Mak. 2019-12-30
JMIR Med Inform. 2019-4-21
BMC Gastroenterol. 2025-2-6
Interact J Med Res. 2024-11-18
PeerJ Comput Sci. 2021-7-19
BMC Med Inform Decis Mak. 2021-6-3
Sustain Cities Soc. 2021-9
BMC Med Inform Decis Mak. 2021-1-26
AJNR Am J Neuroradiol. 2021-3
J Biomed Inform. 2021-1