IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1302-1312. doi: 10.1109/TCBB.2020.3039326. Epub 2022 Jun 3.
The recent advent of high-throughput sequencing technology has enabled us to study the associations between human microbiome and diseases. The DNA sequences of microbiome samples are clustered as operational taxonomic units (OTUs) according to their similarity. The OTU table containing counts of OTUs present in each sample is used to measure correlations between OTUs and disease status and find key microbes for prediction of the disease status. Various statistical methods have been proposed for such microbiome data analysis. However, none of these methods reflects the hierarchy of taxonomy information. In this paper, we propose a hierarchical structural component model for microbiome data (HisCoM-microb) using taxonomy information as well as OTU table data. The proposed HisCoM-microb consists of two layers: one for OTUs and the other for taxa at the higher taxonomy level. Then we calculate simultaneously coefficient estimates of OTUs and taxa of the two layers inserted in the hierarchical model. Through this analysis, we can infer the association between taxa or OTUs and disease status, considering the impact of taxonomic structure on disease status. Both simulation study and real microbiome data analysis show that HisCoM-microb can successfully reveal the relations between each taxon and disease status and identify the key OTUs of the disease at the same time.
高通量测序技术的出现使我们能够研究人类微生物组与疾病之间的关联。根据微生物组样本的相似性,将其 DNA 序列聚类为操作分类单元(OTU)。OTU 表包含每个样本中存在的 OTU 的计数,用于测量 OTU 与疾病状态之间的相关性,并找到用于预测疾病状态的关键微生物。已经提出了各种用于此类微生物组数据分析的统计方法。然而,这些方法都没有反映分类学信息的层次结构。在本文中,我们使用分类学信息和 OTU 表数据为微生物组数据提出了一种层次结构成分模型(HisCoM-microb)。所提出的 HisCoM-microb 由两层组成:一层用于 OTU,另一层用于较高分类学水平的分类单元。然后,我们同时计算插入到层次模型中的 OTU 和两个层的分类单元的系数估计值。通过这种分析,我们可以推断分类单元或 OTU 与疾病状态之间的关联,同时考虑分类结构对疾病状态的影响。模拟研究和真实微生物组数据分析均表明,HisCoM-microb 可以成功揭示每个分类单元与疾病状态之间的关系,并同时识别疾病的关键 OTU。