• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

rCRUX:一种在R语言中生成代谢条形码参考文库的快速通用工具。

rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R.

作者信息

Curd Emily E, Gal Luna, Gallego Ramon, Nielsen Shaun, Gold Zachary

机构信息

Vermont Biomedical Research Network, University of Vermont, VT, USA.

Landmark College, VT, USA.

出版信息

bioRxiv. 2023 Jun 3:2023.05.31.543005. doi: 10.1101/2023.05.31.543005.

DOI:10.1101/2023.05.31.543005
PMID:37397980
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10312559/
Abstract

Key to making accurate taxonomic assignments are curated, comprehensive reference barcode databases. However, the generation and curation of such databases has remained challenging given the large and continuously growing volumes of DNA sequence data and novel reference barcode targets. Monitoring and research applications require a greater diversity of specialized gene regions and targeted taxa to meet taxonomic classification goals then are currently curated by professional staff. Thus, there is a growing need for an easy to implement tool that can generate comprehensive metabarcoding reference libraries for any bespoke locus. We address this need by reimagining CRUX from the Anacapa Toolkit and present the rCRUX package in R. The typical workflow involves searching for plausible seed amplicons (() or ()) by simulating PCR to acquire seed sequences containing a user-defined primer set. Next these seeds are used to iteratively blast search seed sequences against a local NCBI formatted database using a taxonomic rank based stratified random sampling approach (()) that results in a comprehensive set of sequence matches. This database is dereplicated and cleaned (()) by identifying identical reference sequences and collapsing the taxonomic path to the lowest taxonomic agreement across all matching reads. This results in a curated, comprehensive database of primer specific reference barcode sequences from NCBI. We demonstrate that rCRUX provides more comprehensive reference databases for the MiFish Universal Teleost 12S, Taberlet trnl, and fungal ITS locus than CRABS, METACURATOR, RESCRIPt, and ECOPCR reference databases. We then further demonstrate the utility of rCRUX by generating 16 reference databases for metabarcoding loci that lack dedicated reference database curation efforts. The rCRUX package provides a simple to use tool for the generation of curated, comprehensive reference databases for user-defined loci, facilitating accurate and effective taxonomic classification of metabarcoding and DNA sequence efforts broadly.

摘要

进行准确的分类学赋值的关键在于经过整理的、全面的参考条形码数据库。然而,鉴于DNA序列数据量庞大且持续增长,以及新的参考条形码目标,此类数据库的生成和整理一直具有挑战性。监测和研究应用需要比专业人员目前整理的更多样化的专门基因区域和目标分类群,以实现分类学分类目标。因此,越来越需要一种易于实施的工具,该工具可以为任何定制位点生成全面的元条形码参考文库。我们通过重新构想来自阿纳卡帕工具包的CRUX来满足这一需求,并在R语言中展示了rCRUX包。典型的工作流程包括通过模拟PCR搜索合理的种子扩增子(()或()),以获取包含用户定义引物集的种子序列。接下来,使用基于分类等级的分层随机抽样方法(()),将这些种子用于对本地NCBI格式数据库进行种子序列的迭代比对搜索,从而得到一组全面的序列匹配结果。通过识别相同的参考序列并将分类路径合并到所有匹配读数中最低的分类学一致性,对该数据库进行重复数据删除和清理(())。这就产生了一个来自NCBI的经过整理的、全面的引物特异性参考条形码序列数据库。我们证明,与CRABS、METACURATOR、RESCRIPt和ECOPCR参考数据库相比,rCRUX为MiFish通用硬骨鱼12S、塔贝莱trnl和真菌ITS位点提供了更全面的参考数据库。然后,我们通过为缺乏专门参考数据库整理工作的元条形码位点生成16个参考数据库,进一步证明了rCRUX的实用性。rCRUX包为生成针对用户定义位点的经过整理的、全面的参考数据库提供了一个易于使用的工具,广泛促进了元条形码和DNA序列工作的准确有效分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/d51a4bc25365/nihpp-2023.05.31.543005v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/69aa0beef4b6/nihpp-2023.05.31.543005v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/3b9aa1c11167/nihpp-2023.05.31.543005v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/00c133f3cc6d/nihpp-2023.05.31.543005v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/ec6f8e893875/nihpp-2023.05.31.543005v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/d51a4bc25365/nihpp-2023.05.31.543005v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/69aa0beef4b6/nihpp-2023.05.31.543005v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/3b9aa1c11167/nihpp-2023.05.31.543005v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/00c133f3cc6d/nihpp-2023.05.31.543005v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/ec6f8e893875/nihpp-2023.05.31.543005v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4832/10312559/d51a4bc25365/nihpp-2023.05.31.543005v1-f0005.jpg

相似文献

1
rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R.rCRUX:一种在R语言中生成代谢条形码参考文库的快速通用工具。
bioRxiv. 2023 Jun 3:2023.05.31.543005. doi: 10.1101/2023.05.31.543005.
2
rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R.rCRUX:一种在R语言中生成代谢条形码参考文库的快速通用工具。
Environ DNA. 2024 Jan;6(1). doi: 10.1002/edn3.489. Epub 2023 Nov 29.
3
crabs-A software program to generate curated reference databases for metabarcoding sequencing data.Crabs——一个用于为元条形码测序数据生成经过整理的参考数据库的软件程序。
Mol Ecol Resour. 2023 Apr;23(3):725-738. doi: 10.1111/1755-0998.13741. Epub 2022 Dec 11.
4
Improving metabarcoding taxonomic assignment: A case study of fishes in a large marine ecosystem.改进代谢条码分类学分配:大型海洋生态系统中鱼类的案例研究。
Mol Ecol Resour. 2021 Oct;21(7):2546-2564. doi: 10.1111/1755-0998.13450. Epub 2021 Jul 8.
5
A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data.用于 DNA 代谢组学数据分析的分类学分析的 QIIME2 格式参考数据库的详细工作流程。
BMC Genom Data. 2022 Jul 8;23(1):53. doi: 10.1186/s12863-022-01067-5.
6
taxalogue: a toolkit to create comprehensive CO1 reference databases.分类目录:创建全面 CO1 参考数据库的工具包。
PeerJ. 2023 Dec 4;11:e16253. doi: 10.7717/peerj.16253. eCollection 2023.
7
Management of DNA reference libraries for barcoding and metabarcoding studies with the R package refdb.使用R包refdb对用于条形码和宏条形码研究的DNA参考文库进行管理。
Mol Ecol Resour. 2023 Feb;23(2):511-518. doi: 10.1111/1755-0998.13723. Epub 2022 Oct 28.
8
RESCRIPt: Reproducible sequence taxonomy reference database management.RESCIPT:可重复序列分类法参考数据库管理。
PLoS Comput Biol. 2021 Nov 8;17(11):e1009581. doi: 10.1371/journal.pcbi.1009581. eCollection 2021 Nov.
9
Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies.保持积极态度:定制参考数据库和新的本地条码可平衡分类学错误分配在 metabarcoding 研究中。
PeerJ. 2023 Jan 9;11:e14616. doi: 10.7717/peerj.14616. eCollection 2023.
10
Introducing an rbcL and a trnL reference library to aid in the metabarcoding analysis of foraged plants from two semi-arid eastern South African savanna bioregions.引入 rbcL 和 trnL 参考文库,以辅助分析来自南非东部两个半干旱稀树草原生物区系的采集植物的代谢组学分析。
PLoS One. 2023 May 19;18(5):e0286144. doi: 10.1371/journal.pone.0286144. eCollection 2023.

本文引用的文献

1
Data to knowledge in action: A longitudinal analysis of GenBank metadata.实践中的数据到知识:GenBank元数据的纵向分析
Proc Assoc Inf Sci Technol. 2020;57(1). doi: 10.1002/pra2.253. Epub 2020 Oct 22.
2
SituSeq: an offline protocol for rapid and remote Nanopore 16S rRNA amplicon sequence analysis.SituSeq:一种用于快速远程纳米孔16S rRNA扩增子序列分析的离线协议。
ISME Commun. 2023 Apr 20;3(1):33. doi: 10.1038/s43705-023-00239-3.
3
Decoding dissolved information: environmental DNA sequencing at global scale to monitor a changing ocean.
解码溶解信息:在全球范围内进行环境 DNA 测序以监测变化中的海洋。
Curr Opin Biotechnol. 2023 Jun;81:102936. doi: 10.1016/j.copbio.2023.102936. Epub 2023 Apr 14.
4
Systematic review of marine environmental DNA metabarcoding studies: toward best practices for data usability and accessibility.海洋环境 DNA metabarcoding 研究的系统评价:提高数据可用性和可访问性的最佳实践。
PeerJ. 2023 Mar 24;11:e14993. doi: 10.7717/peerj.14993. eCollection 2023.
5
MitoFish, MitoAnnotator, and MiFish Pipeline: Updates in 10 Years.MitoFish、MitoAnnotator 和 MiFish 分析流程:十年更新
Mol Biol Evol. 2023 Mar 4;40(3). doi: 10.1093/molbev/msad035.
6
Aquatic environmental DNA: A review of the macro-organismal biomonitoring revolution.水生环境DNA:宏观生物监测革命综述
Sci Total Environ. 2023 May 15;873:162322. doi: 10.1016/j.scitotenv.2023.162322. Epub 2023 Feb 18.
7
Affordable de novo generation of fish mitogenomes using amplification-free enrichment of mitochondrial DNA and deep sequencing of long fragments.利用线粒体DNA的无扩增富集和长片段深度测序实现鱼类线粒体基因组的经济高效从头生成。
Mol Ecol Resour. 2023 May;23(4):818-832. doi: 10.1111/1755-0998.13758. Epub 2023 Feb 7.
8
COInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline.COInr和mkCOInr:使用半自动流程从BOLD和NCBI构建和定制非冗余条形码参考数据库。
Mol Ecol Resour. 2023 May;23(4):933-945. doi: 10.1111/1755-0998.13756. Epub 2023 Feb 6.
9
Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses.应对元条形码分析中分类参考数据库的七大挑战。
Mol Ecol Resour. 2023 May;23(4):742-755. doi: 10.1111/1755-0998.13746. Epub 2022 Dec 19.
10
crabs-A software program to generate curated reference databases for metabarcoding sequencing data.Crabs——一个用于为元条形码测序数据生成经过整理的参考数据库的软件程序。
Mol Ecol Resour. 2023 Apr;23(3):725-738. doi: 10.1111/1755-0998.13741. Epub 2022 Dec 11.