Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
BMC Bioinformatics. 2024 Jul 16;25(1):241. doi: 10.1186/s12859-024-05859-7.
BACKGROUND: Using next-generation sequencing technologies, scientists can sequence complex microbial communities directly from the environment. Significant insights into the structure, diversity, and ecology of microbial communities have resulted from the study of metagenomics. The assembly of reads into longer contigs, which are then binned into groups of contigs that correspond to different species in the metagenomic sample, is a crucial step in the analysis of metagenomics. It is necessary to organize these contigs into operational taxonomic units (OTUs) for further taxonomic profiling and functional analysis. For binning, which is synonymous with the clustering of OTUs, the tetra-nucleotide frequency (TNF) is typically utilized as a compositional feature for each OTU. RESULTS: In this paper, we present AFIT, a new l-mer statistic vector for each contig, and AFITBin, a novel method for metagenomic binning based on AFIT and a matrix factorization method. To evaluate the performance of the AFIT vector, the t-SNE algorithm is used to compare species clustering based on AFIT and TNF information. In addition, the efficacy of AFITBin is demonstrated on both simulated and real datasets in comparison to state-of-the-art binning methods such as MetaBAT 2, MaxBin 2.0, CONCOT, MetaCon, SolidBin, BusyBee Web, and MetaBinner. To further analyze the performance of the purposed AFIT vector, we compare the barcodes of the AFIT vector and the TNF vector. CONCLUSION: The results demonstrate that AFITBin shows superior performance in taxonomic identification compared to existing methods, leveraging the AFIT vector for improved results in metagenomic binning. This approach holds promise for advancing the analysis of metagenomic data, providing more reliable insights into microbial community composition and function. AVAILABILITY: A python package is available at: https://github.com/SayehSobhani/AFITBin .
背景:利用下一代测序技术,科学家可以直接从环境中对复杂的微生物群落进行测序。通过对宏基因组学的研究,人们对微生物群落的结构、多样性和生态学有了重要的认识。将reads 组装成长度更长的 contigs,然后将这些 contigs 分成对应于宏基因组样本中不同物种的 contigs 组,是宏基因组分析中的一个关键步骤。为了进一步进行分类学分析和功能分析,有必要将这些 contigs 组合成操作分类单元(OTUs)。对于 binning(即 OTUs 的聚类),通常使用四核苷酸频率(TNF)作为每个 OTU 的组成特征。
结果:在本文中,我们提出了一种新的 l-mer 统计向量 AFIT,用于每个 contig,以及一种新的基于 AFIT 和矩阵分解方法的宏基因组 binning 方法 AFITBin。为了评估 AFIT 向量的性能,我们使用 t-SNE 算法比较了基于 AFIT 和 TNF 信息的物种聚类。此外,我们将 AFITBin 方法与 MetaBAT 2、MaxBin 2.0、CONCOT、MetaCon、SolidBin、BusyBee Web 和 MetaBinner 等最新的 binning 方法在模拟和真实数据集上进行了比较,以验证其效果。为了进一步分析所提出的 AFIT 向量的性能,我们比较了 AFIT 向量和 TNF 向量的条形码。
结论:结果表明,与现有方法相比,AFITBin 在分类鉴定方面表现出更好的性能,利用 AFIT 向量可提高宏基因组 binning 的效果。这种方法有望推进宏基因组数据分析,为深入了解微生物群落的组成和功能提供更可靠的见解。
可用性:一个 python 包可在以下网址获得:https://github.com/SayehSobhani/AFITBin。
BMC Bioinformatics. 2019-11-22
Bioinformatics. 2019-11-1
BMC Bioinformatics. 2017-12-28
Bioinformatics. 2020-6-1
BMC Bioinformatics. 2017-9-20
Nat Methods. 2014-9-14
Front Genet. 2020-12-14
Bioinformatics. 2020-6-1
BMC Bioinformatics. 2019-11-22
Bioinformatics. 2019-11-1
Microbiome. 2018-9-15