Suppr超能文献

自然语言处理和机器学习算法识别急性缺血性脑卒中的脑部 MRI 报告。

Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke.

机构信息

Department of Neurology, Hallym University College of Medicine, Chuncheon, Korea.

Medical University of South Carolina, Charleston, South Carolina, United States of America.

出版信息

PLoS One. 2019 Feb 28;14(2):e0212778. doi: 10.1371/journal.pone.0212778. eCollection 2019.

Abstract

BACKGROUND AND PURPOSE

This project assessed performance of natural language processing (NLP) and machine learning (ML) algorithms for classification of brain MRI radiology reports into acute ischemic stroke (AIS) and non-AIS phenotypes.

MATERIALS AND METHODS

All brain MRI reports from a single academic institution over a two year period were randomly divided into 2 groups for ML: training (70%) and testing (30%). Using "quanteda" NLP package, all text data were parsed into tokens to create the data frequency matrix. Ten-fold cross-validation was applied for bias correction of the training set. Labeling for AIS was performed manually, identifying clinical notes. We applied binary logistic regression, naïve Bayesian classification, single decision tree, and support vector machine for the binary classifiers, and we assessed performance of the algorithms by F1-measure. We also assessed how n-grams or term frequency-inverse document frequency weighting affected the performance of the algorithms.

RESULTS

Of all 3,204 brain MRI documents, 432 (14.3%) were labeled as AIS. AIS documents were longer in character length than those of non-AIS (median [interquartile range]; 551 [377-681] vs. 309 [164-396]). Of all ML algorithms, single decision tree had the highest F1-measure (93.2) and accuracy (98.0%). Adding bigrams to the ML model improved F1-mesaure of naïve Bayesian classification, but not in others, and term frequency-inverse document frequency weighting to data frequency matrix did not show any additional performance improvements.

CONCLUSIONS

Supervised ML based NLP algorithms are useful for automatic classification of brain MRI reports for identification of AIS patients. Single decision tree was the best classifier to identify brain MRI reports with AIS.

摘要

背景与目的

本项目评估了自然语言处理(NLP)和机器学习(ML)算法在将脑部 MRI 放射学报告分类为急性缺血性中风(AIS)和非 AIS 表型方面的性能。

材料与方法

将单一学术机构两年内的所有脑部 MRI 报告随机分为两组进行 ML:训练组(70%)和测试组(30%)。使用“quanteda”NLP 包,将所有文本数据解析成标记以创建数据频率矩阵。对训练集进行了 10 倍交叉验证以校正偏差。通过手动识别临床记录来对 AIS 进行标记。我们应用了二项逻辑回归、朴素贝叶斯分类、单决策树和支持向量机作为二分类器,并通过 F1 分数评估了算法的性能。我们还评估了 n-gram 或词频-逆文档频率加权如何影响算法的性能。

结果

在所有 3204 份脑部 MRI 文档中,有 432 份(14.3%)被标记为 AIS。AIS 文档的字符长度长于非 AIS 文档(中位数[四分位距];551[377-681]比 309[164-396])。在所有 ML 算法中,单决策树的 F1 分数(93.2)和准确率(98.0%)最高。在 ML 模型中添加二项式可提高朴素贝叶斯分类的 F1 分数,但其他算法则不行,并且向数据频率矩阵添加词频-逆文档频率加权并没有显示出任何额外的性能改进。

结论

基于监督学习的 NLP 算法可用于自动分类脑部 MRI 报告以识别 AIS 患者。单决策树是识别 AIS 脑部 MRI 报告的最佳分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/992c/6394972/df2f926af387/pone.0212778.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验