Suppr超能文献

用于 DNA 代谢组学数据分析的分类学分析的 QIIME2 格式参考数据库的详细工作流程。

A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data.

机构信息

Life Sciences Department, Bioengineering Unit, Walloon Agricultural Research Center, Chaussée de Charleroi 234, 5030, Gembloux, Belgium.

Life Sciences Department, Plant and Forest Health Unit, Walloon Agricultural Research Center, Rue de Liroux 2, 5030, Gembloux, Belgium.

出版信息

BMC Genom Data. 2022 Jul 8;23(1):53. doi: 10.1186/s12863-022-01067-5.

Abstract

BACKGROUND

The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequencing process, a bioinformatics workflow is required and the QIIME2 platform has emerged as one of the most reliable and commonly used. However, only some pre-formatted reference databases dedicated to a few barcode sequences are available to assign taxonomy. If users want to develop a new custom reference database, several bottlenecks still need to be addressed and a detailed procedure explaining how to develop and format such a database is currently missing. In consequence, this work is aimed at presenting a detailed workflow explaining from start to finish how to develop such a curated reference database for any barcode sequence.

RESULTS

We developed DB4Q2, a detailed workflow that allowed development of plant reference databases dedicated to ITS2 and rbcL, two commonly used barcode sequences in plant metabarcoding studies. This workflow addresses several of the main bottlenecks connected with the development of a curated reference database. The detailed and commented structure of DB4Q2 offers the possibility of developing reference databases even without extensive bioinformatics skills, and avoids 'black box' systems that are sometimes encountered. Some filtering steps have been included to discard presumably fungal and misidentified sequences. The flexible character of DB4Q2 allows several key sequence processing steps to be included or not, and downloading issues can be avoided. Benchmarking the databases developed using DB4Q2 revealed that they performed well compared to previously published reference datasets.

CONCLUSION

This study presents DB4Q2, a detailed procedure to develop custom reference databases in order to carry out taxonomic analyses with QIIME2, but also with other bioinformatics platforms if desired. This work also provides ready-to-use plant ITS2 and rbcL databases for which the prediction accuracy has been assessed and compared to that of other published databases.

摘要

背景

DNA 代谢组学方法已成为研究各种样本类型分类组成的最常用技术之一。为了处理高通量测序过程中产生的大量数据,需要一个生物信息学工作流程,而 QIIME2 平台已经成为最可靠和常用的平台之一。然而,只有一些预格式化的参考数据库可用于分配分类单元,这些数据库专门针对少数条形码序列。如果用户希望开发新的自定义参考数据库,则仍需要解决几个瓶颈问题,并且目前缺少详细说明如何开发此类数据库的过程。因此,本工作旨在介绍一个详细的工作流程,从头到尾解释如何为任何条形码序列开发此类经过精心整理的参考数据库。

结果

我们开发了 DB4Q2,这是一个详细的工作流程,允许为 ITS2 和 rbcL 这两个在植物代谢组学研究中常用的条形码序列开发植物参考数据库。该工作流程解决了与开发经过精心整理的参考数据库相关的几个主要瓶颈问题。DB4Q2 的详细和带注释的结构提供了即使没有广泛的生物信息学技能也可以开发参考数据库的可能性,并避免了有时遇到的“黑盒”系统。已包括一些过滤步骤来丢弃可能是真菌的和错误识别的序列。DB4Q2 的灵活性允许包括或不包括几个关键的序列处理步骤,并避免下载问题。使用 DB4Q2 开发的数据库的基准测试表明,与以前发表的参考数据集相比,它们的性能良好。

结论

本研究介绍了 DB4Q2,这是一种详细的程序,用于开发自定义参考数据库,以便使用 QIIME2 进行分类分析,但如果需要,也可以与其他生物信息学平台一起使用。本工作还提供了可用于植物 ITS2 和 rbcL 的即用型数据库,已评估并比较了它们的预测准确性与其他已发表数据库的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fe7/9264521/b4b3030a8c0b/12863_2022_1067_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验