CSIR-National Environmental Engineering Research Institute (NEERI), Hyderabad Zonal Centre, IICT Campus, Tarnaka, Hyderabad, Telangana, 500007, India.
Biotechnol Lett. 2024 Oct;46(5):791-805. doi: 10.1007/s10529-024-03509-9. Epub 2024 Jul 6.
The pernicious nature of low-quality sequencing data warrants improvement in the bioinformatics workflow for profiling microbial diversity. The conventional merging approach, which drops a copious amount of sequencing reads when processing low-quality amplicon data, requires alternative methods. In this study, a computational workflow, a combination of merging and direct-joining where the paired-end reads lacking overlaps are concatenated and pooled with the merged sequences, is proposed to handle the low-quality amplicon data. The proposed computational strategy was compared with two workflows; the merging approach where the paired-end reads are merged, and the direct-joining approach where the reads are concatenated. The results showed that the merging approach generates a significantly low number of amplicon sequences, limits the microbiome inference, and obscures some microbial associations. In comparison to other workflows, the combination of merging and direct-joining strategy reduces the loss of amplicon data, improves the taxonomy classification, and importantly, abates the misleading results associated with the merging approach when analysing the low-quality amplicon data. The mock community analysis also supports the findings. In summary, the researchers are suggested to follow the merging and direct-joining workflow to avoid problems associated with low-quality data while profiling the microbial community structure.
低质量测序数据的危害性需要改进微生物多样性分析的生物信息学工作流程。传统的合并方法在处理低质量扩增子数据时会丢弃大量测序reads,因此需要替代方法。在这项研究中,提出了一种计算工作流程,即合并和直接连接的组合,其中缺少重叠的配对末端reads 被连接并与合并的序列合并。该研究比较了三种工作流程:合并方法(合并配对末端 reads),直接连接方法(连接 reads)。结果表明,合并方法会产生显著较少数量的扩增子序列,限制微生物组推断,并掩盖一些微生物关联。与其他工作流程相比,合并和直接连接策略的组合减少了扩增子数据的丢失,提高了分类学分类,并且在分析低质量扩增子数据时,重要的是减少了与合并方法相关的误导结果。模拟群落分析也支持了这一发现。总之,建议研究人员在分析微生物群落结构时遵循合并和直接连接工作流程,以避免与低质量数据相关的问题。