University of Duisburg-Essen, Faculty of Biology, Aquatic Ecosystem Research, Essen 45141, Germany.
Univeresity of Duisburg-Essen, Centre for Water and Environmental Research (ZWU), Essen 45141, Germany.
Bioinformatics. 2022 Oct 14;38(20):4817-4819. doi: 10.1093/bioinformatics/btac588.
DNA metabarcoding is an emerging approach to assess and monitor biodiversity worldwide and consequently the number and size of data sets increases exponentially. To date, no published DNA metabarcoding data processing pipeline exists that is (i) platform independent, (ii) easy to use [incl. graphical user interface (GUI)], (iii) fast (does scale well with dataset size) and (iv) complies with data protection regulations of e.g. environmental agencies. The presented pipeline APSCALE meets these requirements and handles the most common tasks of sequence data processing, such as paired-end merging, primer trimming, quality filtering, clustering and denoising of any popular metabarcoding marker, such as internal transcribed spacer, 16S or cytochrome c oxidase subunit I. APSCALE comes in a command line and a GUI version. The latter provides the user with additional summary statistics options and links to GUI-based downstream applications.
APSCALE is written in Python, a platform-independent language, and integrates functions of the open-source tools, VSEARCH (Rognes et al., 2016), cutadapt (Martin, 2011) and LULU (Frøslev et al., 2017). All modules support multithreading to allow fast processing of larger DNA metabarcoding datasets. Further information and troubleshooting are provided on the respective GitHub pages for the command-line version (https://github.com/DominikBuchner/apscale) and the GUI-based version (https://github.com/TillMacher/apscale_gui), including a detailed tutorial.
Supplementary data are available at Bioinformatics online.
DNA 条码技术是一种新兴的方法,用于评估和监测全球生物多样性,因此,数据集的数量和规模呈指数级增长。迄今为止,还没有发布一个(i)与平台无关,(ii)易于使用[包括图形用户界面(GUI)],(iii)快速(可与数据集大小很好地扩展)且(iv)符合环境机构等数据保护法规的 DNA 条码数据处理管道。所提出的 APSCALE 管道满足这些要求,并处理序列数据处理的最常见任务,例如配对末端合并、引物修剪、质量过滤、聚类和任何流行的条码标记(如内部转录间隔区、16S 或细胞色素 c 氧化酶亚基 I)的去噪。APSCALE 有命令行和 GUI 版本。后者为用户提供了附加的摘要统计选项,并链接到基于 GUI 的下游应用程序。
APSCALE 是用 Python 编写的,这是一种与平台无关的语言,并集成了开源工具 VSEARCH(Rognes 等人,2016 年)、cutadapt(Martin,2011 年)和 LULU(Frøslev 等人,2017 年)的功能。所有模块都支持多线程,以允许快速处理更大的 DNA 条码数据集。有关命令行版本(https://github.com/DominikBuchner/apscale)和基于 GUI 的版本(https://github.com/TillMacher/apscale_gui)的更多信息和故障排除都在相应的 GitHub 页面上提供,包括详细的教程。
补充数据可在 Bioinformatics 在线获取。