• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ARA:一个用于自动探索 NCBI SRA 数据集的灵活管道。

ARA: a flexible pipeline for automated exploration of NCBI SRA datasets.

机构信息

Department of Computational Biology, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, 61-614 Poznan, Poland.

出版信息

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad067. Epub 2023 Aug 17.

DOI:10.1093/gigascience/giad067
PMID:37589306
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10433097/
Abstract

BACKGROUND

One of the most effective and useful methods to explore the content of biological databases is searching with nucleotide or protein sequences as a query. However, especially in the case of nucleic acids, due to the large volume of data generated by the next-generation sequencing (NGS) technologies, this approach is often not available. The hierarchical organization of the NGS records is primarily designed for browsing or text-based searches of the information provided in metadata-related keywords, limiting the efficiency of database exploration.

FINDINGS

We developed an automated pipeline that incorporates the well-established NGS data-processing tools and procedures to allow easy and effective sampling of the NCBI SRA database records. Given a file with query nucleotide sequences, our tool estimates the matching content of SRA accessions by probing only a user-defined fraction of a record's sequences. Based on the selected parameters, it allows performing a full mapping experiment with records that meet the required criteria. The pipeline is designed to be easy to operate-it offers a fully automatic setup procedure and is fixed on tested supporting tools. The modular design and implemented usage modes allow a user to scale up the analyses into complex computational infrastructure.

CONCLUSIONS

We present an easy-to-operate and automated tool that expands the way a user can access and explore the information contained within the records deposited in the NCBI SRA database.

摘要

背景

探索生物数据库内容的最有效和最有用的方法之一是以核苷酸或蛋白质序列作为查询进行搜索。然而,特别是在核酸的情况下,由于下一代测序(NGS)技术生成的大量数据,这种方法通常不可用。NGS 记录的层次结构主要设计用于浏览或基于文本的搜索元数据相关关键字中提供的信息,从而限制了数据库探索的效率。

发现

我们开发了一个自动化管道,该管道结合了成熟的 NGS 数据处理工具和程序,以允许轻松有效地从 NCBI SRA 数据库记录中采样。给定一个包含查询核苷酸序列的文件,我们的工具通过仅探测记录序列的用户定义部分来估计 SRA 访问号的匹配内容。根据所选参数,可以使用符合要求标准的记录执行完整的映射实验。该管道旨在易于操作 - 它提供了一个全自动的设置过程,并固定在经过测试的支持工具上。模块化设计和实现的使用模式允许用户将分析扩展到复杂的计算基础设施中。

结论

我们提出了一种易于操作和自动化的工具,该工具扩展了用户访问和探索 NCBI SRA 数据库中记录中包含的信息的方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5efb/10433097/d12891d0eaf8/giad067fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5efb/10433097/d12891d0eaf8/giad067fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5efb/10433097/d12891d0eaf8/giad067fig1.jpg

相似文献

1
ARA: a flexible pipeline for automated exploration of NCBI SRA datasets.ARA:一个用于自动探索 NCBI SRA 数据集的灵活管道。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad067. Epub 2023 Aug 17.
2
pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.pysradb:一个用于查询来自NCBI序列读取存档库的下一代测序元数据和数据的Python包。
F1000Res. 2019 Apr 23;8:532. doi: 10.12688/f1000research.18676.1. eCollection 2019.
3
SRAdb: query and use public next-generation sequencing data from within R.SRAdb:在 R 中查询和使用公共下一代测序数据。
BMC Bioinformatics. 2013 Jan 17;14:19. doi: 10.1186/1471-2105-14-19.
4
Using GenBank and SRA.使用 GenBank 和 SRA。
Methods Mol Biol. 2022;2443:1-25. doi: 10.1007/978-1-0716-2067-0_1.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
The Sequence Read Archive: explosive growth of sequencing data.序列读取档案:测序数据的爆炸式增长。
Nucleic Acids Res. 2012 Jan;40(Database issue):D54-6. doi: 10.1093/nar/gkr854. Epub 2011 Oct 18.
7
A search-based geographic metadata curation pipeline to refine sequencing institution information and support public health.基于搜索的地理元数据编目管道,用于精炼测序机构信息并支持公共卫生。
Front Public Health. 2023 Nov 14;11:1254976. doi: 10.3389/fpubh.2023.1254976. eCollection 2023.
8
MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.MetaSRA:序列读取档案中标准化的人类样本特定元数据。
Bioinformatics. 2017 Sep 15;33(18):2914-2923. doi: 10.1093/bioinformatics/btx334.
9
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.CAIRR 管道用于向国家生物技术信息中心存储库提交符合标准的 B 和 T 细胞受体文库测序研究。
Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018.
10
"METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive".METAGENOTE:一个简化的基因组样本元数据注释的网络平台,简化了向 NCBI 的序列读取档案提交的流程。
BMC Bioinformatics. 2020 Sep 3;21(1):378. doi: 10.1186/s12859-020-03694-0.

本文引用的文献

1
SCRAP: a bioinformatic pipeline for the analysis of small chimeric RNA-seq data.SCRAP:一种用于分析小嵌合RNA测序数据的生物信息学流程。
RNA. 2022 Oct 31;29(1):1-17. doi: 10.1261/rna.079240.122.
2
CRISPRtracrRNA: robust approach for CRISPR tracrRNA detection.CRISPRtracrRNA:CRISPR tracrRNA 检测的稳健方法。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii42-ii48. doi: 10.1093/bioinformatics/btac466.
3
grenepipe: a flexible, scalable and reproducible pipeline to automate variant calling from sequence reads. grenepipe:一个灵活、可扩展且可重复的管道,用于从序列读取中自动进行变体调用。
Bioinformatics. 2022 Oct 14;38(20):4809-4811. doi: 10.1093/bioinformatics/btac600.
4
matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2.matOptimize:一种并行树优化方法,支持 SARS-CoV-2 的在线系统发生分析。
Bioinformatics. 2022 Aug 2;38(15):3734-3740. doi: 10.1093/bioinformatics/btac401.
5
plotsr: visualizing structural similarities and rearrangements between multiple genomes.plotsr:可视化多个基因组之间的结构相似性和重排。
Bioinformatics. 2022 May 13;38(10):2922-2926. doi: 10.1093/bioinformatics/btac196.
6
TransposonUltimate: software for transposon classification, annotation and detection.转座子终极分类注释检测软件
Nucleic Acids Res. 2022 Jun 24;50(11):e64. doi: 10.1093/nar/gkac136.
7
The Sequence Read Archive: a decade more of explosive growth.序列读取档案:十年的爆炸式增长。
Nucleic Acids Res. 2022 Jan 7;50(D1):D387-D390. doi: 10.1093/nar/gkab1053.
8
pyrpipe: a Python package for RNA-Seq workflows.pyrpipe:一个用于RNA测序工作流程的Python软件包。
NAR Genom Bioinform. 2021 Jun 1;3(2):lqab049. doi: 10.1093/nargab/lqab049. eCollection 2021 Jun.
9
pyGenomeTracks: reproducible plots for multivariate genomic datasets.pyGenomeTracks:用于多变量基因组数据集的可重复绘图。
Bioinformatics. 2021 Apr 20;37(3):422-423. doi: 10.1093/bioinformatics/btaa692.
10
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.