Luo Qianwen, Zhang Shanshan, Butt Hamza, Chen Yin, Jiang Hongmei, An Lingling
Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85721, United States.
Interdisciplinary Program in Statistics and Data Science, University of Arizona, Tucson, AZ 85721, United States.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae653.
Sequencing-based microbial count data analysis is a challenging task due to the presence of numerous non-biological zeros, which can impede downstream analysis. To tackle this issue, we introduce two novel approaches, PhyImpute and UniFracImpute, which leverage similar microbial samples to identify and impute non-biological zeros in microbial count data. Our proposed methods utilize the probability of non-biological zeros and phylogenetic trees to estimate sample-to-sample similarity, thus addressing this challenge. To evaluate the performance of our proposed methods, we conduct experiments using both simulated and real microbial data. The results demonstrate that PhyImpute and UniFracImpute outperform existing methods in recovering the zeros and empowering downstream analyses such as differential abundance analysis, and disease status classification.
基于测序的微生物计数数据分析是一项具有挑战性的任务,因为存在大量非生物学零值,这可能会阻碍下游分析。为了解决这个问题,我们引入了两种新颖的方法,即PhyImpute和UniFracImpute,它们利用相似的微生物样本在微生物计数数据中识别和插补非生物学零值。我们提出的方法利用非生物学零值的概率和系统发育树来估计样本间的相似性,从而应对这一挑战。为了评估我们提出的方法的性能,我们使用模拟和真实微生物数据进行了实验。结果表明,PhyImpute和UniFracImpute在恢复零值以及增强下游分析(如差异丰度分析和疾病状态分类)方面优于现有方法。