Garami Attila, Czabajszki Máté, Viskolcz Béla, Oláh Csaba, Váradi Csaba
Institute of Energy, Ceramic and Polymer Technology, University of Miskolc, 3515 Miskolc, Hungary.
Department of Neurosurgery, Borsod-Abaúj-Zemplén County Center Hospital and University Teaching Hospital, 3526 Miskolc, Hungary.
Int J Mol Sci. 2025 Aug 10;26(16):7727. doi: 10.3390/ijms26167727.
Subarachnoid hemorrhage (SAH) and its major complication, cerebral vasospasm (CVS), present significant challenges for early diagnosis and risk stratification. In this study, we developed interpretable decision tree models to differentiate between healthy controls, SAH patients, and SAH patients with vasospasm using serum N-glycomic data. Building on previously published glycomic profiles, we introduced a refined modeling approach combining systematic preprocessing, feature selection, and interpretable machine learning. Our methodology included outlier removal, standard scaling, and a novel correlation-based feature reduction guided by feature importance scores derived from preliminary decision trees. Binary classification tasks (Control vs. SAH and Control vs. CVS, and SAH vs. CVS) were evaluated through stratified repeated cross-validation and hyperparameter optimization. Models achieved high accuracy (up to 0.91) and stable F1-scores across configurations. Key glycans such as FA2(6)G1 (bi-antennary, fucosylated, monogalactosylated), A4G4S3(2) (tetra-antennary, tetra-galactosylated, tri-sialylated), and A3G3S3(5) (tri-antennary, tri-galactosylated, tri-sialylated) emerged as the most discriminative. Visualizations that combine joint feature distributions and decision boundaries provided intuitive insight into the classifier's logic. These findings support the integration of interpretable glycomics-based models into clinical workflows.
蛛网膜下腔出血(SAH)及其主要并发症脑血管痉挛(CVS)对早期诊断和风险分层提出了重大挑战。在本研究中,我们开发了可解释的决策树模型,以使用血清N - 糖组学数据区分健康对照、SAH患者和伴有血管痉挛的SAH患者。基于先前发表的糖组学概况,我们引入了一种精细的建模方法,该方法结合了系统预处理、特征选择和可解释的机器学习。我们的方法包括异常值去除、标准缩放以及一种基于相关性的新型特征约简,该约简由初步决策树得出的特征重要性分数引导。通过分层重复交叉验证和超参数优化评估二元分类任务(对照与SAH、对照与CVS以及SAH与CVS)。模型在各种配置下均实现了高精度(高达0.91)和稳定的F1分数。关键聚糖如FA2(6)G1(双天线、岩藻糖基化、单半乳糖基化)、A4G4S3(2)(四天线、四半乳糖基化、三唾液酸化)和A3G3S3(5)(三天线、三半乳糖基化、三唾液酸化)成为最具区分性的特征。结合联合特征分布和决策边界的可视化提供了对分类器逻辑的直观洞察。这些发现支持将基于可解释糖组学的模型整合到临床工作流程中。