Suppr超能文献

使用机器学习和自然语言处理实现缺血性中风亚型分类的自动化

Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing.

作者信息

Garg Ravi, Oh Elissa, Naidech Andrew, Kording Konrad, Prabhakaran Shyam

机构信息

Department of Neurology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois.

University of Pennsylvania, Philadelphia, Pennsylvania.

出版信息

J Stroke Cerebrovasc Dis. 2019 Jul;28(7):2045-2051. doi: 10.1016/j.jstrokecerebrovasdis.2019.02.004. Epub 2019 May 15.

Abstract

OBJECTIVE

The manual adjudication of disease classification is time-consuming, error-prone, and limits scaling to large datasets. In ischemic stroke (IS), subtype classification is critical for management and outcome prediction. This study sought to use natural language processing of electronic health records (EHR) combined with machine learning methods to automate IS subtyping.

METHODS

Among IS patients from an observational registry with TOAST subtyping adjudicated by board-certified vascular neurologists, we analyzed unstructured text-based EHR data including neurology progress notes and neuroradiology reports using natural language processing. We performed several feature selection methods to reduce the high dimensionality of the features and 5-fold cross validation to test generalizability of our methods and minimize overfitting. We used several machine learning methods and calculated the kappa values for agreement between each machine learning approach to manual adjudication. We then performed a blinded testing of the best algorithm against a held-out subset of 50 cases.

RESULTS

Compared to manual classification, the best machine-based classification achieved a kappa of .25 using radiology reports alone, .57 using progress notes alone, and .57 using combined data. Kappa values varied by subtype being highest for cardioembolic (.64) and lowest for cryptogenic cases (.47). In the held-out test subset, machine-based classification agreed with rater classification in 40 of 50 cases (kappa .72).

CONCLUSIONS

Automated machine learning approaches using textual data from the EHR shows agreement with manual TOAST classification. The automated pipeline, if externally validated, could enable large-scale stroke epidemiology research.

摘要

目的

疾病分类的人工判定耗时、易出错,且限制了对大型数据集的扩展。在缺血性卒中(IS)中,亚型分类对于治疗管理和预后预测至关重要。本研究旨在利用电子健康记录(EHR)的自然语言处理技术结合机器学习方法,实现IS亚型分类的自动化。

方法

在一个观察性登记研究的IS患者中,由获得委员会认证的血管神经科医生对其进行TOAST亚型判定,我们使用自然语言处理技术分析了基于文本的非结构化EHR数据,包括神经科病程记录和神经放射学报告。我们采用了几种特征选择方法来降低特征的高维度,并进行5折交叉验证以测试我们方法的通用性并最小化过拟合。我们使用了几种机器学习方法,并计算了每种机器学习方法与人工判定之间一致性的kappa值。然后,我们对最佳算法针对50例预留病例进行了盲法测试。

结果

与人工分类相比,最佳的基于机器的分类单独使用放射学报告时kappa值为0.25,单独使用病程记录时为0.57,使用组合数据时为0.57。kappa值因亚型而异,心源性栓塞型最高(0.64),隐源性病例最低(0.47)。在预留测试子集中,基于机器的分类在50例中有40例与评估者分类一致(kappa值为0.72)。

结论

使用EHR文本数据的自动化机器学习方法与人工TOAST分类显示出一致性。如果经过外部验证,这种自动化流程可用于大规模卒中流行病学研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验