School of Biological Sciences, University of Bristol, Bristol, UK.
School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand.
J Fish Biol. 2021 Oct;99(4):1446-1454. doi: 10.1111/jfb.14852. Epub 2021 Jul 28.
The accuracy and reliability of DNA metabarcoding analyses depend on the breadth and quality of the reference libraries that underpin them. However, there are limited options available to obtain and curate the huge volumes of sequence data that are available on public repositories such as NCBI and BOLD. Here, we provide a pipeline to download, clean and annotate mitochondrial DNA sequence data for a given list of fish species. Features of this pipeline include (a) support for multiple metabarcode markers; (b) searches on species synonyms and taxonomic name validation; (c) phylogeny assisted quality control for identification and removal of misannotated sequences; (d) automatically generated coverage reports for each new GenBank release update; and (e) citable, versioned DOIs. As an example we provide a ready-to-use curated reference library for the marine and freshwater fishes of the U.K. To augment this reference library for environmental DNA metabarcoding specifically, we generated 241 new MiFish-12S sequences for 88 U.K. marine species, and make available new primer sets useful for sequencing these. This brings the coverage of common U.K. species for the MiFish-12S fragment to 93%, opening new avenues for scaling up fish metabarcoding across wide spatial gradients. The Meta-Fish-Lib reference library and pipeline is hosted at https://github.com/genner-lab/meta-fish-lib.
DNA 代谢组分析的准确性和可靠性取决于支撑它们的参考文库的广度和质量。然而,从 NCBI 和 BOLD 等公共存储库中获取和管理可用的大量序列数据的选择有限。在这里,我们提供了一个用于下载、清理和注释给定鱼类列表的线粒体 DNA 序列数据的管道。该管道的特点包括:(a) 支持多种代谢组标记;(b) 支持对物种同义词和分类名验证的搜索;(c) 基于系统发育的质量控制,用于识别和去除错误注释的序列;(d) 为每个新的 GenBank 版本更新自动生成覆盖报告;以及 (e) 可引用的、有版本号的 DOIs。作为一个例子,我们提供了一个用于英国海洋和淡水鱼类的现成的、经过精心整理的参考库。为了专门针对环境 DNA 代谢组分析扩充这个参考库,我们针对 88 种英国海洋物种生成了 241 个新的 MiFish-12S 序列,并提供了用于对这些序列进行测序的新引物对。这使得 MiFish-12S 片段的常见英国物种覆盖率达到 93%,为在广泛的空间梯度上扩大鱼类代谢组分析开辟了新的途径。Meta-Fish-Lib 参考库和管道托管在 https://github.com/genner-lab/meta-fish-lib。