StrainPanDA:通过基于泛基因组的宏基因组数据分解对菌株组成和基因含量谱进行关联重建
StrainPanDA: Linked reconstruction of strain composition and gene content profiles via pangenome-based decomposition of metagenomic data.
作者信息
Hu Han, Tan Yuxiang, Li Chenhao, Chen Junyu, Kou Yan, Xu Zhenjiang Zech, Liu Yang-Yu, Tan Yan, Dai Lei
机构信息
CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen China.
Bioinformatics Department Xbiome, Scientific Research Building, Tsinghua High-Tech Park Shenzhen China.
出版信息
Imeta. 2022 Aug 1;1(3):e41. doi: 10.1002/imt2.41. eCollection 2022 Sep.
Microbial strains of variable functional capacities coexist in microbiomes. Current bioinformatics methods of strain analysis cannot provide the direct linkage between strain composition and their gene contents from metagenomic data. Here we present -level genome ecomposition nalysis (StrainPanDA), a novel method that uses the pangenome coverage profile of multiple metagenomic samples to simultaneously reconstruct the composition and gene content variation of coexisting strains in microbial communities. We systematically validate the accuracy and robustness of StrainPanDA using synthetic data sets. To demonstrate the power of gene-centric strain profiling, we then apply StrainPanDA to analyze the gut microbiome samples of infants, as well as patients treated with fecal microbiota transplantation. We show that the linked reconstruction of strain composition and gene content profiles is critical for understanding the relationship between microbial adaptation and strain-specific functions (e.g., nutrient utilization and pathogenicity). Finally, StrainPanDA has minimal requirements for computing resources and can be scaled to process multiple species in a community in parallel. In short, StrainPanDA can be applied to metagenomic data sets to detect the association between molecular functions and microbial/host phenotypes to formulate testable hypotheses and gain novel biological insights at the strain or subspecies level.
具有不同功能能力的微生物菌株共存于微生物群落中。当前用于菌株分析的生物信息学方法无法从宏基因组数据中提供菌株组成与其基因含量之间的直接联系。在此,我们提出了菌株水平基因组剖析分析(StrainPanDA),这是一种利用多个宏基因组样本的泛基因组覆盖图谱来同时重建微生物群落中共存菌株的组成和基因含量变异的新方法。我们使用合成数据集系统地验证了StrainPanDA的准确性和稳健性。为了证明以基因为中心的菌株分析的能力,我们随后应用StrainPanDA分析婴儿的肠道微生物群落样本以及接受粪便微生物群移植治疗的患者样本。我们表明,菌株组成和基因含量图谱的关联重建对于理解微生物适应性与菌株特异性功能(如营养利用和致病性)之间的关系至关重要。最后,StrainPanDA对计算资源的要求极低,并且可以扩展以并行处理群落中的多个物种。简而言之,StrainPanDA可应用于宏基因组数据集,以检测分子功能与微生物/宿主表型之间的关联,从而形成可检验的假设,并在菌株或亚种水平上获得新的生物学见解。
相似文献
BMC Res Notes. 2015-9-26
Biol Direct. 2018-5-9
PLoS Comput Biol. 2013-10-17
引用本文的文献
Nat Commun. 2024-3-16
本文引用的文献
Genome Biol. 2021-7-26
Microbiome. 2021-5-21
Nat Methods. 2021-6
Gigascience. 2021-2-16
FEMS Microbiol Ecol. 2021-3-8
PLoS Pathog. 2020-12-10