采用高通量单端 400bp 测序进行高效 COI 条形码分析。

Efficient COI barcoding using high throughput single-end 400 bp sequencing.

机构信息

BGI-Shenzhen, Shenzhen, 518083, China.

College of Life Sciences, Capital Normal University, Beijing, 100048, China.

出版信息

BMC Genomics. 2020 Dec 4;21(1):862. doi: 10.1186/s12864-020-07255-w.

DOI:10.1186/s12864-020-07255-w

PMID:33276723

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7716423/

Abstract

BACKGROUND

Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina's MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio's SEQUEL II system).

RESULTS

Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5' and 3' ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%.

CONCLUSIONS

The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.

摘要

背景

在过去的十年中，高通量测序平台的快速发展加速了物种描述，并通过 DNA 条形码辅助形态分类。然而，由于读长限制（例如 Illumina 的 MiSeq 系统的最大读长为 300bp），当前的高通量 DNA 条形码方法无法获得全长条形码序列，或者由于成本相对较高或测序产量较低而受到阻碍（例如 PacBio 的 SEQUEL II 系统每个细胞最多可读取 800 万条reads）。

结果

使用单端 400bp（SE400）模块在 MGISEQ-2000 平台上对个体标本的细胞色素 c 氧化酶亚基 I（COI）条形码进行了测序。我们提出了一种生物信息学管道 HIFI-SE，它采用 COI 条形码区域的 5'和 3'端生成的读取并将它们组装成全长条形码。HIFI-SE 是用 Python 编写的，包括四个功能模块：过滤、分配、组装和分类学。我们将 HIFI-SE 应用于一组 845 个样本（30 种海洋无脊椎动物，815 种昆虫），共交付了 747 个完全组装的 COI 条形码以及 70 个 Wolbachia 和真菌共生体。与它们对应的 Sanger 序列（可获得 72 个序列）相比，几乎所有样本（71/72）都被正确且准确地组装，包括 46 个相似度为 100%的样本和 25 个相似度约为 99%的样本。

结论

HIFI-SE 管道代表了一种产生标准全长条形码的有效方法，而我们方法的合理成本和高灵敏度可以在相同预算下提供更多的 DNA 条形码。因此，我们的方法推进了来自不同生态系统的基于 DNA 的物种鉴定，并增加了相关应用的数量。

相似文献

Efficient COI barcoding using high throughput single-end 400 bp sequencing.采用高通量单端 400bp 测序进行高效 COI 条形码分析。

BMC Genomics. 2020 Dec 4;21(1):862. doi: 10.1186/s12864-020-07255-w.

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.通过高通量测序组装 DNA 条码填补参考空白——迈向世界条码化。

Gigascience. 2017 Dec 1;6(12):1-8. doi: 10.1093/gigascience/gix104.

Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens.下一代DNA条形码技术：利用下一代测序技术增强并加速从单个样本中捕获DNA条形码。

Mol Ecol Resour. 2014 Sep;14(5):892-901. doi: 10.1111/1755-0998.12236. Epub 2014 Feb 19.

Sorting specimen-rich invertebrate samples with cost-effective NGS barcodes: Validating a reverse workflow for specimen processing.用具有成本效益的 NGS 条码对富含无脊椎动物样本进行分类：验证标本处理的反向工作流程。

Mol Ecol Resour. 2018 May;18(3):490-501. doi: 10.1111/1755-0998.12751. Epub 2018 Feb 2.

Illumina midi-barcodes: quality proof and applications.Illumina中规模条形码：质量验证及应用

Mitochondrial DNA A DNA Mapp Seq Anal. 2019 Apr;30(3):490-499. doi: 10.1080/24701394.2018.1551386. Epub 2019 Jan 11.

Pyrosequencing for mini-barcoding of fresh and old museum specimens.焦磷酸测序法用于新鲜和陈旧博物馆标本的微型条码标记。

PLoS One. 2011;6(7):e21252. doi: 10.1371/journal.pone.0021252. Epub 2011 Jul 27.

Skimming for barcodes: rapid production of mitochondrial genome and nuclear ribosomal repeat reference markers through shallow shotgun sequencing.条码扫描：通过浅层鸟枪法测序快速生成线粒体基因组和核核糖体重复参考标记。

PeerJ. 2022 Aug 5;10:e13790. doi: 10.7717/peerj.13790. eCollection 2022.

Estimating the biodiversity of terrestrial invertebrates on a forested island using DNA barcodes and metabarcoding data.利用 DNA 条形码和代谢组学数据估算森林岛屿上的陆地无脊椎动物的生物多样性。

Ecol Appl. 2019 Jun;29(4):e01877. doi: 10.1002/eap.1877. Epub 2019 Apr 8.

DNA barcoding of marine metazoa.海洋后生动物的 DNA 条形码技术。

Ann Rev Mar Sci. 2011;3:471-508. doi: 10.1146/annurev-marine-120308-080950.

A Rapid and Cost-Effective Identification of Invertebrate Pests at the Borders Using MinION Sequencing of DNA Barcodes.利用 MinION 测序 DNA 条形码在边境快速、经济有效地鉴定无脊椎害虫。

Genes (Basel). 2021 Jul 27;12(8):1138. doi: 10.3390/genes12081138.

引用本文的文献

Development of mitochondrial DNA cytochrome oxidase subunit I primer sets to construct DNA barcoding library using next-generation sequencing.开发线粒体DNA细胞色素氧化酶亚基I引物组以利用下一代测序构建DNA条形码文库。

Biodivers Data J. 2024 Jun 18;12:e117014. doi: 10.3897/BDJ.12.e117014. eCollection 2024.

Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.基于机器学习和数据库的方法在高通量测序数据分类中的应用与比较。

Genome Biol Evol. 2024 May 2;16(5). doi: 10.1093/gbe/evae102.

taxalogue: a toolkit to create comprehensive CO1 reference databases.分类目录：创建全面 CO1 参考数据库的工具包。

PeerJ. 2023 Dec 4;11:e16253. doi: 10.7717/peerj.16253. eCollection 2023.

Strategies for sample labelling and library preparation in DNA metabarcoding studies.DNA 条形码研究中的样本标记和文库制备策略。

Mol Ecol Resour. 2022 May;22(4):1231-1246. doi: 10.1111/1755-0998.13512. Epub 2021 Oct 13.

本文引用的文献

$1 DNA barcodes for reconstructing complex phenomes and finding rare species in specimen-rich samples.1. 用于重建复杂表型组并在样本丰富的样本中发现稀有物种的DNA条形码。

Cladistics. 2016 Feb;32(1):100-110. doi: 10.1111/cla.12115. Epub 2015 Mar 9.

Long-read human genome sequencing and its applications.长读长基因组测序及其应用。

Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.

BIOSCAN: DNA barcoding to accelerate taxonomy and biogeography for conservation and sustainability.生物扫描：用于加速分类学和生物地理学研究以促进保护与可持续发展的DNA条形码技术

Genome. 2021 Mar;64(3):161-164. doi: 10.1139/gen-2020-0009. Epub 2020 Apr 8.

Comparative analysis of novel MGISEQ-2000 sequencing platform vs Illumina HiSeq 2500 for whole-genome sequencing.新型 MGISEQ-2000 测序平台与 Illumina HiSeq 2500 全基因组测序的比较分析。

PLoS One. 2020 Mar 16;15(3):e0230301. doi: 10.1371/journal.pone.0230301. eCollection 2020.

Validation of COI metabarcoding primers for terrestrial arthropods.用于陆生节肢动物的COI元条形码引物的验证

PeerJ. 2019 Oct 7;7:e7745. doi: 10.7717/peerj.7745. eCollection 2019.

DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work.用于监测欧洲水生物种的 DNA 条码参考图书馆：差距分析和未来工作建议。

Sci Total Environ. 2019 Aug 15;678:499-524. doi: 10.1016/j.scitotenv.2019.04.247. Epub 2019 Apr 27.

COI barcoding of plant bugs (Insecta: Hemiptera: Miridae).植物蝽类（昆虫纲：半翅目：盲蝽科）的细胞色素氧化酶亚基I条形码技术

PeerJ. 2018 Dec 4;6:e6070. doi: 10.7717/peerj.6070. eCollection 2018.

Over 2.5 million COI sequences in GenBank and growing.GenBank 中超过 250 万个 COI 序列，并且还在不断增加。

PLoS One. 2018 Sep 7;13(9):e0200177. doi: 10.1371/journal.pone.0200177. eCollection 2018.

A Sequel to Sanger: amplicon sequencing that scales.桑格续集：可扩展的扩增子测序。

BMC Genomics. 2018 Mar 27;19(1):219. doi: 10.1186/s12864-018-4611-3.

Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.基于 cPAS 的 BGISEQ-500 平台用于宏基因组测序的评估。

Gigascience. 2018 Mar 1;7(3):1-8. doi: 10.1093/gigascience/gix133.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

采用高通量单端 400bp 测序进行高效 COI 条形码分析。

Efficient COI barcoding using high throughput single-end 400 bp sequencing.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献