Lee Ho-Joon, Schwamm Lee H, Sansing Lauren H, Kamel Hooman, de Havenon Adam, Turner Ashby C, Sheth Kevin N, Krishnaswamy Smita, Brandt Cynthia, Zhao Hongyu, Krumholz Harlan, Sharma Richa
Department of Genetics and Yale Center for Genome Analysis, Yale School of Medicine, New Haven, CT, USA.
Department of Neurology and Comprehensive Stroke Center, Massachusetts General Hospital and Harvard Medical School Boston, Boston, MA, USA.
NPJ Digit Med. 2024 May 17;7(1):130. doi: 10.1038/s41746-024-01120-w.
Determining acute ischemic stroke (AIS) etiology is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification tool, StrokeClassifier, using electronic health record (EHR) text from 2039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology adjudicated by agreement of at least 2 board-certified vascular neurologists' review of the EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with vascular neurologists' diagnoses, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 and weighted F1 of 0.74 for multi-class classification. In MIMIC-III, its accuracy and weighted F1 were 0.70 and 0.71, respectively. In binary classification, the two metrics ranged from 0.77 to 0.96. The top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We designed a certainty heuristic to grade the confidence of StrokeClassifier's diagnosis as non-cryptogenic by the degree of consensus among the 9 classifiers and applied it to 788 cryptogenic patients, reducing cryptogenic diagnoses from 25.2% to 7.2%. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.
确定急性缺血性卒中(AIS)的病因是二级卒中预防工作的基础,但在诊断上可能具有挑战性。我们使用来自两家学术医院2039例非隐源性AIS患者的电子健康记录(EHR)文本,训练并验证了一种自动分类工具StrokeClassifier,以预测由至少两名经董事会认证的血管神经科医生对EHR进行审查后判定的卒中病因的四级结果。StrokeClassifier是一种集成共识元模型,由9个机器学习分类器组成,应用于通过自然语言处理从出院小结文本中提取的特征。StrokeClassifier在由血管神经科医生审查的MIMIC-III数据集中的406份出院小结中进行了外部验证,以确定卒中病因。与血管神经科医生的诊断相比,StrokeClassifier在多类分类中实现了平均交叉验证准确率为0.74,加权F1值为0.74。在MIMIC-III中,其准确率和加权F1值分别为0.70和0.71。在二元分类中,这两个指标的范围为0.77至0.96。对卒中病因预测贡献最大的前5个特征是心房颤动、年龄、大脑中动脉闭塞、颈内动脉闭塞和额叶卒中部位。我们设计了一种确定性启发式方法,根据9个分类器之间的共识程度,将StrokeClassifier诊断为非隐源性的置信度进行分级,并将其应用于788例隐源性患者,将隐源性诊断从25.2%降至7.2%。StrokeClassifier是一种经过验证的人工智能工具,在缺血性卒中病因分类方面的表现可与血管神经科医生相媲美。通过进一步训练,StrokeClassifier可能具有下游应用,包括用作临床决策支持系统。