Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
Department of Neurology, Dartmouth Health, Lebanon, NH, USA.
BMC Neurol. 2024 Aug 3;24(1):272. doi: 10.1186/s12883-024-03760-7.
Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS.
A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health's EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores-SHapley Additive exPlanation (SHAP) values-for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation.
Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795-0.840), sensitivity of 0.766 (95% CI 0.743-0.790), F-score of 0.761 (95% CI 0.744-0.778), AUC of 0.808 (95% CI 0.791-0.825), and accuracy of 0.775 (95% CI 0.759-0.790).
This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.
尽管罕见神经疾病(RND)的诊断常常存在延迟,但由于其罕见性,统计数据的效力不足,因此仍然难以研究 RND 及其合并症。僵硬人综合征(SPS)每年影响一百万分之一到两百万分之一,是一种以疼痛性肌肉痉挛和僵硬为特征的 RND。本研究利用未充分利用的电子健康记录(EHR),展示了一种基于机器学习的框架,用于识别最佳表征 SPS 诊断的临床特征。
在达特茅斯健康 EHR 中,对过去病史中的 319 项内容(48 名个体,其中 23 名被诊断为 SPS,25 名对照)进行基于机器学习的特征选择方法,这些个体的血清谷氨酸脱羧酶 65 自身抗体(抗-GAD65)水平升高,以确定具有最高判别力的特征。算法的每一次迭代都生成一个支持向量机(SVM)模型,为每个特征生成重要性评分-SHapley Additive exPlanation(SHAP)值,并删除一个最不重要的特征。通过重复分层交叉验证计算评估指标。
抑郁、甲状腺功能减退、胃食管反流病和关节痛是 SPS 最具特征性的特征。利用这些特征,SVM 模型的精度为 0.817(95%CI 0.795-0.840),敏感性为 0.766(95%CI 0.743-0.790),F 分数为 0.761(95%CI 0.744-0.778),AUC 为 0.808(95%CI 0.791-0.825),准确率为 0.775(95%CI 0.759-0.790)。
该框架辨别出的特征,经过进一步研究,可能有助于全面表征 SPS 的病理机制:抑郁、甲状腺功能减退和胃食管反流病可能分别通过共同的炎症、遗传和自主神经紊乱联系代表合并症。这种方法可以通过揭示潜在的关联并为 RND 生成假说,来解决神经科的诊断挑战。