Suppr超能文献

使用BLAST+、UBLAST、LAST和BLAT评估原核生物基因组中的过度注释。

Estimating overannotation across prokaryotic genomes using BLAST+, UBLAST, LAST and BLAT.

作者信息

Moreno-Hagelsieb Gabriel, Hudy-Yuffa Brigitte

机构信息

Department of Biology, Wilfrid Laurier University, 75 University Ave, W,, N2L 3C5 Waterloo, ON, Canada.

出版信息

BMC Res Notes. 2014 Sep 16;7:651. doi: 10.1186/1756-0500-7-651.

Abstract

BACKGROUND

As the number of genomes in public databases increases, it becomes more important to be able to quickly choose the best annotated genomes for further analyses in comparative genomics and evolution. A proxy to annotation quality is the estimation of overannotation by comparing annotated coding genes against the SwissProt database. NCBI's BLAST (BLAST+) is the common software of choice to compare these sequences. Newer programs that run in a fraction of the time as BLAST+ might miss matches that BLAST+ would find. However, the results might still be useful to calculate overannotation. We thus decided to compare the overannotation estimates yielded using three such programs, UBLAST, LAST and the Blast-Like Alignment Tool (BLAT), and to test non-redundant versions of the SwissProt database to reduce the number of comparisons necessary.

FINDINGS

We found that all, UBLAST, LAST and BLAT, tend to produce similar overannotation estimates to those obtained with BLAST+. As would be expected, results varied the most from those obtained with BLAST+ in genomes with fewer proteins matching sequences in the SwissProt database. UBLAST was the fastest running algorithm, and showed the smallest variation from the results obtained using BLAST+. Reduced SwissProt databases did not seem to affect the results much, but the reduction in time was modest compared to that obtained from UBLAST, LAST, or BLAT.

CONCLUSIONS

Despite faster programs miss sequence matches otherwise found by NCBI's BLAST, the overannotation estimates are very similar and thus these programs can be used with confidence for this task.

摘要

背景

随着公共数据库中基因组数量的增加,能够快速选择最佳注释基因组以用于比较基因组学和进化的进一步分析变得越发重要。注释质量的一个代理指标是通过将注释的编码基因与瑞士蛋白质数据库(SwissProt)进行比较来估计过度注释。美国国立医学图书馆(NCBI)的BLAST(BLAST+)是比较这些序列时常用的软件。运行速度比BLAST+快得多的新程序可能会错过BLAST+能找到的匹配项。然而,这些结果对于计算过度注释可能仍然有用。因此,我们决定比较使用三个这样的程序(UBLAST、LAST和类BLAST比对工具(BLAT))得出的过度注释估计值,并测试瑞士蛋白质数据库的非冗余版本以减少所需的比较数量。

研究结果

我们发现,UBLAST、LAST和BLAT这三个程序得出的过度注释估计值往往与使用BLAST+得出的结果相似。正如预期的那样,在与瑞士蛋白质数据库中序列匹配的蛋白质较少的基因组中,这些程序的结果与使用BLAST+得出的结果差异最大。UBLAST是运行速度最快的算法,并且与使用BLAST+得出的结果相比变化最小。精简后的瑞士蛋白质数据库似乎对结果影响不大,但与使用UBLAST、LAST或BLAT相比,时间减少幅度较小。

结论

尽管速度更快的程序会错过NCBI的BLAST原本能找到的序列匹配项,但过度注释估计值非常相似,因此这些程序可放心用于此任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2515/4180129/a8e530158b30/13104_2013_3190_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验