Department of Bioinformatics, KRIBB School of Bioscience, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, Korea.
Department of Environmental Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Korea.
J Transl Med. 2021 Jun 7;19(1):250. doi: 10.1186/s12967-021-02909-z.
Inflammatory bowel disease (IBD) is a chronic and idiopathic inflammatory disorder of the gastrointestinal tract and comprises ulcerative colitis (UC) and Crohn's disease (CD). Crohn's disease can affect any part of the gastrointestinal tract, but mainly the terminal ileum and colon. In the present study, we aimed to characterize terminal-ileal CD (ICD) and colonic CD (CCD) at the molecular level, which might enable a more optimized approach for the clinical care and scientific research of CD.
We analyzed differentially expressed genes in samples from 23 treatment-naïve paediatric patients with CD and 25 non-IBD controls, and compared the data with previously published RNA-Seq data using multi-statistical tests and confidence intervals. We implemented functional profiling and proposed statistical methods for feature selection using a logistic regression model to identify genes that are highly associated in ICD or CCD. We also validated our final candidate genes in independent paediatric and adult cohorts.
We identified 550 genes specifically expressed in patients with CD compared with those in healthy controls (p < 0.05). Among these DEGs, 240 from patients with CCD were mainly involved in mitochondrial dysfunction, whereas 310 from patients with ICD were enriched in the ileum functions such as digestion, absorption, and metabolism. To choose the most effective gene set, we selected the most powerful genes (p-value ≤ 0.05, accuracy ≥ 0.8, and AUC ≥ 0.8) using logistic regression. Consequently, 33 genes were identified as useful for discriminating CD location; the accuracy and AUC were 0.86 and 0.83, respectively. We then validated the 33 genes with data from another independent paediatric cohort (accuracy = 0.93, AUC = 0.92) and adult cohort (accuracy = 0.88, AUC = 0.72).
In summary, we identified DEGs that are specifically expressed in CCD and ICD compared with those in healthy controls and patients with UC. Based on the feature selection analysis, 33 genes were identified as useful for discriminating CCD and ICD with high accuracy and AUC, for not only paediatric patients but also independent cohorts. We propose that our approach and the final gene set are useful for the molecular classification of patients with CD, and it could be beneficial in treatments based on disease location.
炎症性肠病(IBD)是一种慢性、特发性胃肠道炎症性疾病,包括溃疡性结肠炎(UC)和克罗恩病(CD)。克罗恩病可影响胃肠道的任何部位,但主要影响末端回肠和结肠。本研究旨在从分子水平上对末端回肠 CD(ICD)和结肠 CD(CCD)进行特征描述,以便为 CD 的临床护理和科学研究提供更优化的方法。
我们分析了 23 例未经治疗的儿童 CD 患者和 25 例非 IBD 对照者的样本中的差异表达基因,并使用多统计检验和置信区间与之前发表的 RNA-Seq 数据进行比较。我们实施了功能谱分析,并使用逻辑回归模型提出了特征选择的统计方法,以鉴定与 ICD 或 CCD 高度相关的基因。我们还在独立的儿科和成人队列中验证了最终的候选基因。
我们鉴定出 550 个与健康对照者相比在 CD 患者中特异性表达的基因(p < 0.05)。在这些差异表达基因中,240 个来自 CCD 患者的基因主要参与线粒体功能障碍,而 310 个来自 ICD 患者的基因在回肠功能如消化、吸收和代谢中富集。为了选择最有效的基因集,我们使用逻辑回归选择了最有效的基因(p 值≤0.05、准确性≥0.8 和 AUC≥0.8)。因此,鉴定出 33 个基因可用于区分 CD 部位;准确性和 AUC 分别为 0.86 和 0.83。然后,我们用另一项独立的儿科队列的数据验证了这 33 个基因(准确性=0.93,AUC=0.92)和成人队列(准确性=0.88,AUC=0.72)。
总之,我们鉴定出了与健康对照者和 UC 患者相比,在 CCD 和 ICD 中特异性表达的差异表达基因。基于特征选择分析,鉴定出 33 个基因可用于区分 CCD 和 ICD,具有较高的准确性和 AUC,不仅适用于儿科患者,也适用于独立队列。我们提出,我们的方法和最终基因集有助于 CD 患者的分子分类,并且可以根据疾病部位进行治疗。