BioSense Institute, University of Novi Sad, Novi Sad, Serbia.
Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia.
Stud Health Technol Inform. 2021 Oct 27;285:165-170. doi: 10.3233/SHTI210591.
In this study, we investigate faecal microbiota composition, in an attempt to evaluate performance of classification algorithms in identifying Inflammatory Bowel Disease (IBD) and its two types: Crohn's disease (CD) and ulcerative colitis (UC). From many investigated algorithms, a random forest (RF) classifier was selected for detailed evaluation in three-class (CD versus UC versus nonIBD) classification task and two binary (nonIBD versus IBD and CD versus UC) classification tasks. We dealt with class imbalance, performed extensive parameter search, dimensionality reduction and two-level classification. In three-class classification, our best model reaches F1 score of 91% in average, which confirms the strong connection of IBD and gastrointestinal microbiome. Among most important features in three-class classification are species Staphylococcus hominis, Porphyromonas endodontalis, Slackia piriformis and genus Bacteroidetes.
在这项研究中,我们研究了粪便微生物群落组成,试图评估分类算法在识别炎症性肠病(IBD)及其两种类型:克罗恩病(CD)和溃疡性结肠炎(UC)方面的性能。在众多研究的算法中,选择了随机森林(RF)分类器来详细评估三分类(CD 与 UC 与非 IBD)分类任务和二分类(非 IBD 与 IBD 和 CD 与 UC)分类任务。我们处理了类不平衡问题,进行了广泛的参数搜索、降维和两级分类。在三分类分类中,我们最好的模型平均达到了 91%的 F1 得分,这证实了 IBD 与胃肠道微生物群之间的紧密联系。在三分类分类中最重要的特征包括物种屎肠球菌、牙髓卟啉单胞菌、拟杆菌和属。