Pezoulas Vasileios C, Papaloukas Costas, Veyssiere Maëva, Goules Andreas, Tzioufas Athanasios G, Soumelis Vassili, Fotiadis Dimitrios I
Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina GR45110, Greece.
Department of Biological Applications and Technology, University of Ioannina, Ioannina GR45100, Greece.
Comput Struct Biotechnol J. 2021 May 24;19:3058-3068. doi: 10.1016/j.csbj.2021.05.036. eCollection 2021.
Unlike autoimmune diseases, there is no known constitutive and disease-defining biomarker for systemic autoinflammatory diseases (SAIDs). Kawasaki disease (KD) is one of the "undiagnosed" types of SAIDs whose pathogenic mechanism and gene mutation still remain unknown. To address this issue, we have developed a sequential computational workflow which clusters KD patients with similar gene expression profiles across the three different KD phases (Acute, Subacute and Convalescent) and utilizes the resulting clustermap to detect prominent genes that can be used as diagnostic biomarkers for KD. Self-Organizing Maps (SOMs) were employed to cluster patients with similar gene expressions across the three phases through inter-phase and intra-phase clustering. Then, false discovery rate (FDR)-based feature selection was applied to detect genes that significantly deviate across the per-phase clusters. Our results revealed five genes as candidate biomarkers for KD diagnosis, namely, the HLA-DQB1, HLA-DRA, ZBTB48, TNFRSF13C, and CASD1. To our knowledge, these five genes are reported for the first time in the literature. The impact of the discovered genes for KD diagnosis against the known ones was demonstrated by training boosting ensembles (AdaBoost and XGBoost) for KD classification on common platform and cross-platform datasets. The classifiers which were trained on the proposed genes from the common platform data yielded an average increase by 4.40% in accuracy, 5.52% in sensitivity, and 3.57% in specificity than the known genes in the Acute and Subacute phases, followed by a notable increase by 2.30% in accuracy, 2.20% in sensitivity, and 4.70% in specificity in the cross-platform analysis.
与自身免疫性疾病不同,目前尚无已知的用于系统性自身炎症性疾病(SAIDs)的组成性且能定义疾病的生物标志物。川崎病(KD)是SAIDs中“未确诊”的类型之一,其致病机制和基因突变仍不清楚。为解决这一问题,我们开发了一种顺序计算工作流程,该流程对三个不同KD阶段(急性期、亚急性期和恢复期)具有相似基因表达谱的KD患者进行聚类,并利用所得的聚类图来检测可作为KD诊断生物标志物的显著基因。通过相间和相内聚类,采用自组织映射(SOMs)对三个阶段具有相似基因表达的患者进行聚类。然后,应用基于错误发现率(FDR)的特征选择来检测在每个阶段聚类中显著差异的基因。我们的结果揭示了五个基因作为KD诊断的候选生物标志物,即HLA - DQB1、HLA - DRA、ZBTB48、TNFRSF13C和CASD1。据我们所知,这五个基因在文献中首次被报道。通过在通用平台和跨平台数据集上训练用于KD分类的增强集成模型(AdaBoost和XGBoost),证明了所发现的基因对KD诊断相对于已知基因的影响。在通用平台数据中基于所提出的基因训练的分类器,在急性期和亚急性期,与已知基因相比,准确率平均提高了4.40%,灵敏度提高了5.52%,特异性提高了3.57%,在跨平台分析中,准确率显著提高了2.30%,灵敏度提高了2.20%,特异性提高了4.70%。