IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):2078-2088. doi: 10.1109/TCBB.2018.2831212. Epub 2018 Apr 30.
Inflammatory Bowel Disease (IBD) is an umbrella term for a group of inflammatory diseases of the gastrointestinal tract, including Crohn's Disease and ulcerative colitis. Changes to the intestinal microbiome, the community of micro-organisms that resides in the human gut, have been shown to contribute to the pathogenesis of IBD. IBD diagnosis is often delayed due to its non-specific symptoms and because an invasive colonoscopy is required for confirmation, which leads to poor growth in children and worse treatment outcomes. Feature selection algorithms are often applied to microbial communities to identify bacterial groups that drive disease. It has been shown that aggregating Ensemble Feature Selection (EFS) can improve the robustness of feature selection algorithms, which is defined as the variation of feature selector output caused by small changes to the dataset. In this work, we apply a two-step filter and an EFS process to generate robust feature subsets that can non-invasively predict IBD subtypes from high-resolution microbiome data. The predictive power of the robust feature subsets is the highest reported in literature to date. Furthermore, we identify five biologically plausible bacterial species that have not previously been implicated in IBD aetiology.
炎症性肠病(IBD)是一组胃肠道炎症性疾病的统称,包括克罗恩病和溃疡性结肠炎。肠道微生物组的变化,即居住在人类肠道中的微生物群落,已被证明有助于 IBD 的发病机制。由于其非特异性症状,并且需要进行侵入性结肠镜检查以确认诊断,因此 IBD 的诊断常常被延迟,这导致儿童生长不良和治疗效果更差。特征选择算法通常应用于微生物群落,以识别驱动疾病的细菌群。已经表明,聚合集成特征选择(EFS)可以提高特征选择算法的稳健性,稳健性定义为由于数据集的微小变化而导致特征选择器输出的变化。在这项工作中,我们应用两步过滤和 EFS 过程来生成稳健的特征子集,这些特征子集可以从高分辨率微生物组数据中无创地预测 IBD 亚型。稳健特征子集的预测能力是迄今为止文献中报道的最高水平。此外,我们确定了五种以前未涉及 IBD 病因的具有生物学意义的细菌物种。