Wang Shuai, Jiang Yiqi, Li Shuaicheng
Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
Bioinformatics. 2021 Apr 1;36(22-23):5499-5506. doi: 10.1093/bioinformatics/btaa1056.
The microbial community plays an essential role in human diseases and physiological activities. The functions of microbes can differ due to strain-level differences in the genome sequences. Shotgun metagenomic sequencing allows us to profile the strains in microbial communities practically. However, current methods are underdeveloped due to the highly similar sequences among strains. We observe that strains genotypes at the same single nucleotide variant (SNV) locus can be speculated by the genotype frequencies. Also, the variants in different loci covered by the same reads can provide evidence that they reside on the same strain.
These insights inspire us to design PStrain, an optimization method that utilizes genotype frequencies and the reads which cover multiple SNV loci to profile strains iteratively based on SNVs in a set of MetaPhlAn2 marker genes. Compared to the state-of-art methods, PStrain, on average, improved the performance of inferring strains abundances and genotypes by 87.75% and 59.45%, respectively. We have applied the PStrain package to the dataset with two cohorts of colorectal cancer (CRC) and found that the sequences of Bacteroides coprocola strains are significantly different between CRC and control samples, which is the first time to report the potential role of B.coprocola in the gut microbiota of CRC.
https://github.com/wshuai294/PStrain.
Supplementary data are available at Bioinformatics online.
微生物群落在人类疾病和生理活动中起着至关重要的作用。由于基因组序列中菌株水平的差异,微生物的功能可能会有所不同。鸟枪法宏基因组测序使我们能够实际分析微生物群落中的菌株。然而,由于菌株之间的序列高度相似,目前的方法尚不完善。我们观察到,可以通过基因型频率推测同一单核苷酸变异(SNV)位点的菌株基因型。此外,同一 reads 覆盖的不同位点的变异可以提供证据表明它们存在于同一菌株上。
这些见解启发我们设计了 PStrain,这是一种优化方法,它利用基因型频率和覆盖多个 SNV 位点的 reads,基于一组 MetaPhlAn2 标记基因中的 SNV 对菌株进行迭代分析。与现有方法相比,PStrain 平均将推断菌株丰度和基因型的性能分别提高了 87.75%和 59.45%。我们将 PStrain 软件包应用于两个结直肠癌(CRC)队列的数据集,发现 CRC 样本和对照样本中粪拟杆菌菌株的序列存在显著差异,这是首次报道粪拟杆菌在 CRC 肠道微生物群中的潜在作用。
https://github.com/wshuai294/PStrain。
补充数据可在《生物信息学》在线获取。