Suppr超能文献

采用高通量单端 400bp 测序进行高效 COI 条形码分析。

Efficient COI barcoding using high throughput single-end 400 bp sequencing.

机构信息

BGI-Shenzhen, Shenzhen, 518083, China.

College of Life Sciences, Capital Normal University, Beijing, 100048, China.

出版信息

BMC Genomics. 2020 Dec 4;21(1):862. doi: 10.1186/s12864-020-07255-w.

Abstract

BACKGROUND

Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina's MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio's SEQUEL II system).

RESULTS

Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5' and 3' ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%.

CONCLUSIONS

The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.

摘要

背景

在过去的十年中,高通量测序平台的快速发展加速了物种描述,并通过 DNA 条形码辅助形态分类。然而,由于读长限制(例如 Illumina 的 MiSeq 系统的最大读长为 300bp),当前的高通量 DNA 条形码方法无法获得全长条形码序列,或者由于成本相对较高或测序产量较低而受到阻碍(例如 PacBio 的 SEQUEL II 系统每个细胞最多可读取 800 万条reads)。

结果

使用单端 400bp(SE400)模块在 MGISEQ-2000 平台上对个体标本的细胞色素 c 氧化酶亚基 I(COI)条形码进行了测序。我们提出了一种生物信息学管道 HIFI-SE,它采用 COI 条形码区域的 5'和 3'端生成的读取并将它们组装成全长条形码。HIFI-SE 是用 Python 编写的,包括四个功能模块:过滤、分配、组装和分类学。我们将 HIFI-SE 应用于一组 845 个样本(30 种海洋无脊椎动物,815 种昆虫),共交付了 747 个完全组装的 COI 条形码以及 70 个 Wolbachia 和真菌共生体。与它们对应的 Sanger 序列(可获得 72 个序列)相比,几乎所有样本(71/72)都被正确且准确地组装,包括 46 个相似度为 100%的样本和 25 个相似度约为 99%的样本。

结论

HIFI-SE 管道代表了一种产生标准全长条形码的有效方法,而我们方法的合理成本和高灵敏度可以在相同预算下提供更多的 DNA 条形码。因此,我们的方法推进了来自不同生态系统的基于 DNA 的物种鉴定,并增加了相关应用的数量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验