Suppr超能文献

coil:一个用于细胞色素氧化酶 I (COI) DNA 条码数据清理、翻译和错误评估的 R 包。

coil: an R package for cytochrome oxidase I (COI) DNA barcode data cleaning, translation, and error evaluation.

机构信息

Department of Integrative Biology, University of Guelph. Guelph, Ontario, Canada.

Centre for Biodiversity Genomics, Biodiversity Institute of Ontario, University of Guelph. Guelph, Ontario, Canada.

出版信息

Genome. 2020 Jun;63(6):291-305. doi: 10.1139/gen-2019-0206. Epub 2020 May 14.

Abstract

Biological conclusions based on DNA barcoding and metabarcoding analyses can be strongly influenced by the methods utilized for data generation and curation, leading to varying levels of success in the separation of biological variation from experimental error. The 5' region of cytochrome oxidase subunit I (COI-5P) is the most common barcode gene for animals, with conserved structure and function that allows for biologically informed error identification. Here, we present coil ( https://CRAN.R-project.org/package=coil ), an R package for the pre-processing and frameshift error assessment of COI-5P animal barcode and metabarcode sequence data. The package contains functions for placement of barcodes into a common reading frame, accurate translation of sequences to amino acids, and highlighting insertion and deletion errors. The analysis of 10 000 barcode sequences of varying quality demonstrated how coil can place barcode sequences in reading frame and distinguish sequences containing indel errors from error-free sequences with greater than 97.5% accuracy. Package limitations were tested through the analysis of COI-5P sequences from the plant and fungal kingdoms as well as the analysis of potential contaminants: nuclear mitochondrial pseudogenes and COI-5P sequences. Results demonstrated that coil is a strong technical error identification method but is not reliable for detecting all biological contaminants.

摘要

基于 DNA 条形码和代谢条形码分析的生物学结论可能会受到用于数据生成和管理的方法的强烈影响,从而导致从实验误差中分离生物学变异的成功率不同。细胞色素氧化酶亚基 I(COI-5P)的 5'区域是动物最常用的条形码基因,具有保守的结构和功能,允许进行生物信息错误识别。在这里,我们介绍 coil(https://CRAN.R-project.org/package=coil),这是一个用于动物 COI-5P 条形码和代谢条形码序列数据预处理和移码错误评估的 R 包。该软件包包含用于将条形码放入通用阅读框、将序列准确翻译成氨基酸以及突出插入和删除错误的功能。对质量不同的 10000 个条形码序列的分析表明,coil 如何将条形码序列放入阅读框,并以超过 97.5%的准确率区分包含插入缺失错误的序列和无错误序列。通过分析植物和真菌王国的 COI-5P 序列以及潜在污染物(核线粒体假基因和 COI-5P 序列)来测试软件包的限制。结果表明,coil 是一种强大的技术错误识别方法,但不能可靠地检测所有生物污染物。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验