Faisal Fahim, Danelakis Antonios, Bjørk Marte-Helene, Winsvold Bendik, Matharu Manjit, Nachev Parashkev, Hagen Knut, Tronvik Erling, Stubberud Anker
Norhead Norwegian Centre for Headache Research, NTNU Norwegian University of Science and Technology, Trondheim, Norway.
Department of Neuromedicine and Movement Science, NTNU Norwegian University of Science and Technology, Trondheim, Norway.
J Headache Pain. 2025 Apr 7;26(1):70. doi: 10.1186/s10194-025-02014-2.
Migraine is associated with a range of symptoms and comorbid disorders and has a strong genetic basis, but the currently identified risk loci only explain a small portion of the heritability, often termed the "missing heritability". We aimed to investigate if machine learning could exploit the combination of genetic data and general clinical features to identify individuals at risk for new-onset migraine.
This study was a population-based cohort study of adults from the second and third Trøndelag Health Study (HUNT2 and HUNT3). Migraine was captured in a validated questionnaire and based on modified criteria of the International Classification of Headache Disorders (ICHD) and participants underwent genome-wide genotyping. The primary outcome was new-onset migraine defined as a change in disease status from headache-free in HUNT2 to migraine in HUNT3. The migraine risk variants identified in the largest GWAS meta-analysis of migraine were used to identify genetic input features for the models. The general clinical features included demographics, selected comorbidities, medication and stimulant use and non-headache symptoms as predictive factors. Several standard machine learning architectures were constructed, trained, optimized and scored with area under the receiver operating characteristics curve (AUC). The best model during training and validation was used on unseen test sets. Different methods for model explainability were employed.
A total of 12,995 individuals were included in the predictive modelling (491 new-onset cases). A total of 108 genetic variants and 67 general clinical variables were included in the models. The top performing decision-tree classifier achieved a test set AUC of 0.56 when using only genotypic data, 0.68 when using only clinical data and 0.72 when using both genetic and clinical data. Combining the genotype only and clinical data only models resulted in a lower predictivity with an AUC of 0.67. The most important clinical features were age, marital status and work situation as well as several genetic variants.
The combination of genotype and routine demographic and non-headache clinical data correctly predict the new onset of migraine in approximately 2 out of 3 cases, supporting that there are important genotypic-phenotypic interactions partaking in the new-onset of migraine.
偏头痛与一系列症状及共病相关,且有很强的遗传基础,但目前已确定的风险基因座仅解释了一小部分遗传力,通常称为“遗传力缺失”。我们旨在研究机器学习能否利用遗传数据和一般临床特征的组合来识别新发偏头痛风险个体。
本研究是基于特隆赫姆郡健康研究第二轮和第三轮(HUNT2和HUNT3)的成年人群队列研究。偏头痛通过经过验证的问卷进行采集,并基于《国际头痛疾病分类》(ICHD)的修订标准进行判断,参与者接受了全基因组基因分型。主要结局是新发偏头痛,定义为疾病状态从HUNT2中无头痛转变为HUNT3中的偏头痛。在最大规模的偏头痛全基因组关联研究(GWAS)荟萃分析中确定的偏头痛风险变异用于识别模型的遗传输入特征。一般临床特征包括人口统计学信息、选定的共病、药物和兴奋剂使用情况以及非头痛症状作为预测因素。构建了几种标准的机器学习架构,进行训练、优化,并通过受试者操作特征曲线下面积(AUC)进行评分。在训练和验证过程中表现最佳的模型用于未见过的测试集。采用了不同的模型可解释性方法。
共有12995名个体纳入预测模型(491例新发病例)。模型中总共纳入了108个基因变异和67个一般临床变量。表现最佳的决策树分类器在仅使用基因型数据时测试集AUC为0.56,仅使用临床数据时为0.68,同时使用遗传和临床数据时为0.72。仅结合基因型模型和仅结合临床数据模型的预测性较低,AUC为0.67。最重要的临床特征是年龄、婚姻状况和工作情况以及几个基因变异。
基因型与常规人口统计学和非头痛临床数据的组合能在约三分之二的病例中正确预测偏头痛的新发情况,支持了在偏头痛新发过程中存在重要的基因型-表型相互作用。