Xu Congmin, Zhou Man, Xie Zhongjie, Li Mo, Zhu Xi, Zhu Huaiqiu
State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.
Center for Quantitative Biology, Peking University, Beijing, 100871, China.
BioData Min. 2021 Jan 19;14(1):2. doi: 10.1186/s13040-021-00241-2.
The diagnosis of inflammatory bowel disease (IBD) and discrimination between the types of IBD are clinically important. IBD is associated with marked changes in the intestinal microbiota. Advances in next-generation sequencing (NGS) technology and the improved hospital bioinformatics analysis ability motivated us to develop a diagnostic method based on the gut microbiome.
Using a set of whole-genome sequencing (WGS) data from 349 human gut microbiota samples with two types of IBD and healthy controls, we assembled and aligned WGS short reads to obtain feature profiles of strains and genera. The genus and strain profiles were used for the 16S-based and WGS-based diagnostic modules construction respectively. We designed a novel feature selection procedure to select those case-specific features. With these features, we built discrimination models using different machine learning algorithms. The machine learning algorithm LightGBM outperformed other algorithms in this study and thus was chosen as the core algorithm. Specially, we identified two small sets of biomarkers (strains) separately for the WGS-based health vs IBD module and ulcerative colitis vs Crohn's disease module, which contributed to the optimization of model performance during pre-training. We released LightCUD as an IBD diagnostic program built with LightGBM. The high performance has been validated through five-fold cross-validation and using an independent test data set. LightCUD was implemented in Python and packaged free for installation with customized databases. With WGS data or 16S rRNA sequencing data of gut microbiome samples as the input, LightCUD can discriminate IBD from healthy controls with high accuracy and further identify the specific type of IBD. The executable program LightCUD was released in open source with instructions at the webpage http://cqb.pku.edu.cn/ZhuLab/LightCUD/ . The identified strain biomarkers could be used to study the critical factors for disease development and recommend treatments regarding changes in the gut microbial community.
As the first released human gut microbiome-based IBD diagnostic tool, LightCUD demonstrates a high-performance for both WGS and 16S sequencing data. The strains that either identify healthy controls from IBD patients or distinguish the specific type of IBD are expected to be clinically important to serve as biomarkers.
炎症性肠病(IBD)的诊断以及IBD类型的鉴别在临床上具有重要意义。IBD与肠道微生物群的显著变化相关。下一代测序(NGS)技术的进步以及医院生物信息学分析能力的提高促使我们开发一种基于肠道微生物组的诊断方法。
我们使用了来自349个人类肠道微生物群样本(包括两种类型的IBD和健康对照)的一组全基因组测序(WGS)数据,对WGS短读段进行组装和比对,以获得菌株和属的特征图谱。属和菌株图谱分别用于构建基于16S和基于WGS的诊断模块。我们设计了一种新颖的特征选择程序来选择那些病例特异性特征。利用这些特征,我们使用不同的机器学习算法构建了判别模型。在本研究中,机器学习算法LightGBM的表现优于其他算法,因此被选为核心算法。特别地,我们分别为基于WGS的健康与IBD模块以及溃疡性结肠炎与克罗恩病模块确定了两组小的生物标志物(菌株),这有助于在预训练期间优化模型性能。我们发布了LightCUD,这是一个用LightGBM构建的IBD诊断程序。其高性能已通过五折交叉验证和使用独立测试数据集得到验证。LightCUD用Python实现,并免费打包以便与定制数据库一起安装。以肠道微生物群样本的WGS数据或16S rRNA测序数据作为输入,LightCUD可以高精度地区分IBD与健康对照,并进一步识别IBD的具体类型。可执行程序LightCUD以开源形式发布,并在网页http://cqb.pku.edu.cn/ZhuLab/LightCUD/ 上提供说明。所确定的菌株生物标志物可用于研究疾病发展的关键因素,并根据肠道微生物群落的变化推荐治疗方法。
作为首个发布的基于人类肠道微生物组的IBD诊断工具,LightCUD在WGS和16S测序数据方面均表现出高性能。那些能够区分IBD患者与健康对照或区分IBD具体类型的菌株有望作为生物标志物在临床上具有重要意义。