Suppr超能文献

获取序列信息:一套工具,用于从公共存储库中获取基因组序列信息。

getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories.

机构信息

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Faculté de Médecine Hyacinthe Bastaraud, Université des Antilles, Pointe-à-Pitre, France.

出版信息

BMC Bioinformatics. 2022 Jul 8;23(1):268. doi: 10.1186/s12859-022-04809-5.

Abstract

BACKGROUND

Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms.

RESULTS

The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a "NucleScore" for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis.

CONCLUSION

The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo . getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform ( http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html ).

摘要

背景

生物序列在全球范围内迅速呈指数级增长。核苷酸序列数据库在提供各种生物生物体的有意义基因组信息方面发挥着重要作用。

结果

getSequenceInfo 软件工具允许从各种公共存储库(GenBank、RefSeq 和欧洲核苷酸档案库)访问序列信息,并且与不同的操作系统(Linux、MacOS 和 Microsoft Windows)兼容,以编程方式(命令行)或作为图形用户界面。getSequenceInfo 或 gSeqI v1.0 应该有助于用户获取有关查询序列的一些信息,这些信息可能对特定研究有用(例如,来源/隔离国家或查询序列的发布日期)。可以根据给定的王国和物种或给定的日期进行查询以检索序列数据。该程序允许通过将每个组件安排在给定的文件夹中,将染色体和质粒(或其他遗传元件/组件)分开。该程序还执行一些基本统计信息(例如,计算查询组装的 GC 含量)。使用核苷酸信息计算经验设计的核苷酸比率,以便暂时为研究基因组组装提供“NucleScore”。除了主要的 gSeqI 工具外,还开发了其他附加工具来执行与序列分析相关的各种任务。

结论

本研究的目的是通过编程方式使公共存储库的使用民主化,并从教学角度促进序列数据分析。输出结果以 FASTA、FASTQ、Excel/TSV 或 HTML 格式提供。该程序可在以下网址免费获得:https://github.com/karubiotools/getSequenceInfo。getSequenceInfo 和补充工具部分可通过最近发布的 Galaxy KaruBioNet 平台(http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html)获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54ff/9270759/01d1d82f54c9/12859_2022_4809_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验