Xu Yuanwei, Nash Katrina, Acharjee Animesh, Gkoutos Georgios V
Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK.
NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham B15 2TT, UK.
Bioinformatics. 2022 Mar 4;38(6):1639-1647. doi: 10.1093/bioinformatics/btab879.
Existing microbiome-based disease prediction relies on the ability of machine learning methods to differentiate disease from healthy subjects based on the observed taxa abundance across samples. Despite numerous microbes have been implicated as potential biomarkers, challenges remain due to not only the statistical nature of microbiome data but also the lack of understanding of microbial interactions which can be indicative of the disease.
We propose CACONET (classification of Compositional-Aware COrrelation NETworks), a computational framework that learns to classify microbial correlation networks and extracts potential signature interactions, taking as input taxa relative abundance across samples and their health status. By using Bayesian compositional-aware correlation inference, a collection of posterior correlation networks can be drawn and used for graph-level classification, thus incorporating uncertainty in the estimates. CACONET then employs a deep learning approach for graph classification, achieving excellent performance metrics by exploiting the correlation structure. We test the framework on both simulated data and a large real-world dataset pertaining to microbiome samples of colorectal cancer (CRC) and healthy subjects, and identify potential network substructure characteristic of CRC microbiota. CACONET is customizable and can be adapted to further improve its utility.
CACONET is available at https://github.com/yuanwxu/corr-net-classify.
Supplementary data are available at Bioinformatics online.
现有的基于微生物组的疾病预测依赖于机器学习方法根据样本中观察到的分类群丰度来区分疾病与健康受试者的能力。尽管有许多微生物被认为是潜在的生物标志物,但由于微生物组数据的统计性质以及对可能指示疾病的微生物相互作用缺乏了解,挑战依然存在。
我们提出了CACONET(成分感知相关网络分类法),这是一个计算框架,它以样本中的分类群相对丰度及其健康状态作为输入,学习对微生物相关网络进行分类并提取潜在的特征相互作用。通过使用贝叶斯成分感知相关推理,可以绘制一组后验相关网络并将其用于图级分类,从而在估计中纳入不确定性。然后,CACONET采用深度学习方法进行图分类,通过利用相关结构实现了出色的性能指标。我们在模拟数据以及与结直肠癌(CRC)和健康受试者的微生物组样本相关的大型真实世界数据集上测试了该框架,并确定了CRC微生物群潜在的网络子结构特征。CACONET是可定制的,可以进行调整以进一步提高其效用。
CACONET可在https://github.com/yuanwxu/corr-net-classify获取。
补充数据可在《生物信息学》在线获取。