Suppr超能文献

5q型脊髓性肌萎缩症脊柱侧弯预测的特征工程

Feature Engineering for the Prediction of Scoliosis in 5q-Spinal Muscular Atrophy.

作者信息

Vu-Han Tu-Lan, Sunkara Vikram, Bermudez-Schettino Rodrigo, Schwechten Jakob, Runge Robin, Perka Carsten, Winkler Tobias, Pokutta Sebastian, Weiß Claudia, Pumberger Matthias

机构信息

Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Center for Musculoskeletal Surgery, Berlin, Germany.

Explainable AI for Biology, Zuse Institute Berlin, Berlin, Germany.

出版信息

J Cachexia Sarcopenia Muscle. 2025 Feb;16(1):e13599. doi: 10.1002/jcsm.13599. Epub 2024 Dec 5.

Abstract

BACKGROUND

5q-Spinal muscular atrophy (SMA) is now one of the 5% treatable rare diseases worldwide. As disease-modifying therapies alter disease progression and patient phenotypes, paediatricians and consulting disciplines face new unknowns in their treatment decisions. Conclusions made from historical patient data sets are now mostly limited, and new approaches are needed to ensure our continued best standard-of-care practices for this exceptional patient group. Here, we present a data-driven machine learning approach to a rare disease data set to predict spinal muscular atrophy (SMA)-associated scoliosis.

METHODS

We collected data from 84 genetically confirmed 5q-SMA patients who have received novel SMA therapies. We performed expert domain knowledge-directed feature engineering, correlation and predictive power score (PPS) analyses for feature selection. To test the predictive performance of the selected features, we trained a Random Forest Classifier and evaluated model performance using standard metrics.

RESULTS

The SMA data set consisted of 1304 visits and over 360 variables. We performed feature engineering for variables related to 'interventions', 'devices', 'orthosis', 'ventilation', 'muscle contractures' and 'motor milestones'. Through correlation and PPS analysis paired with expert domain knowledge feature selection, we identified relevant features for scoliosis prediction in SMA that included disease progression markers: Hammersmith Functional Motor Scale Expanded 'HFMSE' (PPS = 0.27) and 6-Minute Walk Test '6MWT' scores (PPS = 0.44), 'age' (PPS = 0.41) and 'weight' (PPS = 0.49), 'contractures' (PPS = 0.17), the use of 'assistive devices' (PPS = 0.39, 'ventilation' (PPS = 0.16) and the presence of 'gastric tubes' (PPS = 0.35) in SMA patients. These features were validated using expert domain knowledge and used to train a Random Forest Classifier with an observed accuracy of 0.82 and an average receiver operating characteristic (ROC) area of 0.87.

CONCLUSION

The introduction of disease-modifying SMA therapies, followed by the implementation of SMA in newborn screenings, has presented physicians with never-seen patients. We used feature engineering tools to overcome one of the main challenges when using data-driven approaches in rare disease data sets. Through predictive modelling of this data, we defined disease progression markers, which are easily assessed during patient visits and can help anticipate scoliosis onset. This highlights the importance of progressive features in the drug-induced revolution of this rare disease and further supports the ongoing efforts to update the SMA classification. We advocate for the consistent documentation of relevant progression markers, which will serve as a basis for data-driven models that physicians can use to update their best standard-of-care practices.

摘要

背景

5q型脊髓性肌萎缩症(SMA)现已成为全球5%可治疗的罕见病之一。随着疾病修饰疗法改变疾病进展和患者表型,儿科医生和相关会诊学科在治疗决策上面临新的未知情况。基于历史患者数据集得出的结论现在大多有限,需要新的方法来确保我们对这一特殊患者群体持续采用最佳的标准治疗方案。在此,我们提出一种数据驱动的机器学习方法,用于分析罕见病数据集以预测脊髓性肌萎缩症(SMA)相关脊柱侧弯。

方法

我们收集了84例经基因确诊且接受过新型SMA疗法的患者的数据。我们进行了专家领域知识导向的特征工程、相关性和预测能力评分(PPS)分析以进行特征选择。为测试所选特征的预测性能,我们训练了一个随机森林分类器,并使用标准指标评估模型性能。

结果

SMA数据集包含1304次就诊记录和360多个变量。我们对与“干预措施”“设备”“矫形器”“通气”“肌肉挛缩”和“运动里程碑”相关的变量进行了特征工程。通过相关性和PPS分析并结合专家领域知识特征选择,我们确定了SMA中脊柱侧弯预测的相关特征,包括疾病进展标志物:哈默史密斯功能运动量表扩展版“HFMSE”(PPS = 0.27)和6分钟步行试验“6MWT”评分(PPS = 0.44)、“年龄”(PPS = 0.41)和“体重”(PPS = 0.49)、“挛缩”(PPS = 0.17)、“辅助设备”的使用(PPS = 0.39)、“通气”(PPS = 0.16)以及SMA患者中“胃管”的存在(PPS = 0.35)。这些特征通过专家领域知识进行了验证,并用于训练一个随机森林分类器,观察到的准确率为0.82,平均受试者工作特征(ROC)曲线下面积为0.87。

结论

疾病修饰性SMA疗法的引入,以及随后在新生儿筛查中实施SMA检测,使医生面对从未见过的患者。我们使用特征工程工具克服了在罕见病数据集中使用数据驱动方法时的一个主要挑战。通过对这些数据的预测建模,我们定义了疾病进展标志物,这些标志物在患者就诊期间易于评估,有助于预测脊柱侧弯的发病。这凸显了在这种罕见病的药物引发的变革中渐进性特征的重要性,并进一步支持了正在进行的更新SMA分类的努力。我们提倡持续记录相关的进展标志物,这将作为数据驱动模型的基础,医生可利用这些模型来更新他们的最佳标准治疗方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5756/11670177/b016499403e4/JCSM-16-e13599-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验