Suppr超能文献

用户友好型生物信息学流程 gDAT(图形下游分析工具),用于分析 rDNA 序列。

User-friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences.

机构信息

Department of Botany, University of Tartu, Tartu, Estonia.

Institute of Plant Sciences, University of Bern, Bern, Switzerland.

出版信息

Mol Ecol Resour. 2021 May;21(4):1380-1392. doi: 10.1111/1755-0998.13340. Epub 2021 Feb 12.

Abstract

High-throughput sequencing (HTS) of multiple organisms in parallel (metabarcoding) has become a routine and cost-effective method for the analysis of microbial communities in environmental samples. However, careful data treatment is required to identify potential errors in HTS data, and the large volume of data generated by HTS requires in-house experience with command line tools for downstream analysis. This paper introduces a pipeline that incorporates the most common command line tools into an easy-to-use graphical interface-gDAT. By using the Python scripting language, the pipeline is compatible with the latest Windows, macOS and Linux operating systems. The pipeline supports analysis of Sanger, 454, IonTorrent, Illumina and PacBio sequences, allows custom modification of quality filtering steps, and implements both open and closed-reference operational taxonomic unit-picking for sequence identification. Predefined parameters are optimized for analysis of small subunit (SSU) rRNA gene amplicons from arbuscular mycorrhizal fungi, but the pipeline is widely applicable to metabarcoding studies targeting a broad range of organisms. The pipeline was additionally tested with data using general eukaryotic primers from the SSU gene region and fungal primers from the internal transcribed spacer (ITS) marker region. We describe the pipeline design and evaluate its performance and speed by conducting analysis of example data sets using different marker regions sequenced on Illumina platforms. The graphical interface, with the option to use the command line if needed, provides an accessible tool for rapid data analysis with repeatability and logging capabilities. Keeping the software open-source maximizes code accessibility, allowing scrutiny and bug fixes by the community.

摘要

高通量测序(HTS)可同时对多个生物体进行平行测序(宏条形码),这已经成为一种分析环境样本中微生物群落的常规且经济有效的方法。然而,需要仔细的数据处理才能识别 HTS 数据中的潜在错误,并且 HTS 生成的大量数据需要对命令行工具具有内部经验,以便进行下游分析。本文介绍了一个将最常用的命令行工具集成到一个易于使用的图形界面中的管道 - gDAT。通过使用 Python 脚本语言,该管道与最新的 Windows、macOS 和 Linux 操作系统兼容。该管道支持 Sanger、454、IonTorrent、Illumina 和 PacBio 序列的分析,允许对质量过滤步骤进行自定义修改,并实现了序列识别的开放式和封闭式参考操作分类单元(OTU)提取。预定义的参数针对小亚基(SSU)rRNA 基因扩增子的分析进行了优化,这些扩增子来自丛枝菌根真菌,但该管道广泛适用于针对广泛生物体的宏条形码研究。该管道还使用来自 SSU 基因区域的通用真核引物和来自内部转录间隔区(ITS)标记区域的真菌引物的数据进行了测试。我们描述了管道设计,并通过使用 Illumina 平台上测序的不同标记区域对示例数据集进行分析,评估了其性能和速度。图形界面提供了一种可访问的工具,具有可重复使用和记录功能,如果需要,还可以使用命令行。保持软件开源可最大限度地提高代码的可访问性,允许社区进行审查和错误修复。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验