Betts Joshua W, Still John M, Lasko Thomas A
Vanderbilt University School of Medicine, 2209 Garland Ave, Nashville, TN.
Vanderbilt University Medical Center, Department of Biomedical Informatics, 2525 West End Ave, Nashville, TN.
ArXiv. 2025 Apr 22:arXiv:2505.04631v1.
Migraine is a common but complex neurological disorder that doubles the lifetime risk of cryptogenic stroke (CS). However, this relationship remains poorly characterized, and few clinical guidelines exist to reduce this associated risk. We therefore propose a data-driven approach to extract probabilistically-independent sources from electronic health record (EHR) data and create a 10-year risk-predictive model for CS in migraine patients. These sources represent external latent variables acting on the causal graph constructed from the EHR data and approximate root causes of CS in our population. A random forest model trained on patient expressions of these sources demonstrated good accuracy (ROC 0.771) and identified the top 10 most predictive sources of CS in migraine patients. These sources revealed that pharmacologic interventions were the most important factor in minimizing CS risk in our population and identified a factor related to allergic rhinitis as a potential causative source of CS in migraine patients.
偏头痛是一种常见但复杂的神经系统疾病,会使不明原因中风(CS)的终生风险增加一倍。然而,这种关系仍未得到充分描述,且几乎没有临床指南来降低这种相关风险。因此,我们提出一种数据驱动的方法,从电子健康记录(EHR)数据中提取概率独立的来源,并为偏头痛患者创建一个10年的CS风险预测模型。这些来源代表作用于从EHR数据构建的因果图上的外部潜在变量,并近似于我们人群中CS的根本原因。基于这些来源的患者表达训练的随机森林模型显示出良好的准确性(ROC 0.771),并确定了偏头痛患者中CS的前10个最具预测性的来源。这些来源表明,药物干预是我们人群中降低CS风险的最重要因素,并确定了一个与过敏性鼻炎相关的因素作为偏头痛患者CS的潜在致病源。