Suppr超能文献

gff2sequence,一个新的用户友好型工具,用于生成基因组序列。

gff2sequence, a new user friendly tool for the generation of genomic sequences.

机构信息

Dipartimento di Agraria, Università degli Studi di Sassari, Sassari 07100, Italy.

出版信息

BioData Min. 2013 Sep 11;6(1):15. doi: 10.1186/1756-0381-6-15.

Abstract

BACKGROUND

General Feature Format (GFF) files are used to store genome features such as genes, exons, introns, primary transcripts etc. Although many software packages (i.e. ab initio gene prediction programs) can annotate features by using such a standard, a small number of tools have been developed to extract the corresponding sequence information from the original genome. However the present tools do not execute either a quality control or a customizable filter of the annotated features is available.

FINDINGS

gff2sequence is a program that extracts nucleotide/protein sequences from a genomic multifasta by using the information provided by a general feature format file. While a graphical user interface makes this software very easy to use, a C++ algorithm allows high performance together with low hardware demand. The software also allows the extraction of the genic portions such as the untranslated and the coding sequences. Moreover a highly customizable quality control pipeline can be used to deal with anomalous splicing sites, incorrect open reading frames and not canonical characters within the retrieved sequences.

CONCLUSIONS

gff2sequence is a user friendly program that allows the generation of highly customizable sequence datasets by processing a general feature format file. The presence of a wide range of quality filters makes this tool also suitable for refining the ab initio gene predictions.

摘要

背景

通用特征格式(GFF)文件用于存储基因组特征,如基因、外显子、内含子、初级转录本等。尽管许多软件包(例如从头基因预测程序)可以使用这种标准注释特征,但开发的工具很少能够从原始基因组中提取相应的序列信息。然而,目前的工具既没有执行质量控制,也没有提供可定制的注释特征过滤器。

发现

gff2sequence 是一个程序,它通过使用通用特征格式文件提供的信息,从基因组多 FASTA 中提取核苷酸/蛋白质序列。虽然图形用户界面使这个软件非常易于使用,但 C++算法允许高性能和低硬件需求。该软件还允许提取基因部分,如非翻译和编码序列。此外,还可以使用高度可定制的质量控制管道来处理异常剪接位点、不正确的开放阅读框和检索序列中的非规范字符。

结论

gff2sequence 是一个用户友好的程序,它允许通过处理通用特征格式文件生成高度可定制的序列数据集。存在广泛的质量过滤器,使这个工具也适合于精炼从头基因预测。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验