• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPAC:评估基因组信息学分析对基因组注释完整性的敏感性

GPAC: benchmarking the sensitivity of genome informatics analysis to genome annotation completeness.

作者信息

Arakawa Kazuharu, Nakayama Yoichi, Tomita Masaru

机构信息

Institute for Advanced Biosciences, Keio University, Fujisawa, 252-8520 Kanagawa, Japan.

出版信息

In Silico Biol. 2006;6(1-2):49-60.

PMID:16789913
Abstract

In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered "sensitive" to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC bendhmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/.

摘要

鉴于近期基因组序列数据的爆炸式增长,以及目前已有200多个完整的基因组序列,基因组规模的生物信息学分析的重要性正在迅速增加。然而,计算基因组信息学分析往往缺乏对其功能注释完整性敏感性的统计评估。因此,一种用于自动验证计算基因组分析对基因组注释完整性敏感性的预分析方法对此很有用。在本报告中,我们开发了基因预测准确性分类(GPAC)测试,该测试通过对五个不同基因组(根据注释准确性水平分类)以及与五个分类组中每个组基因数量相同的随机抽样基因组重复相同分析,提供敏感性的统计证据。然后评估这些结果的变异性,如果结果随不同数据子集有显著差异,则该分析被认为对注释完整性“敏感”,并建议在实际的计算机分析之前仔细选择数据。GPAC测试已应用于Sakai等人(2001年)和Ohno等人(2001年)的分析,结果表明Ohno等人的分析对注释完整性更敏感。这表明GPAC可用于确定分析的敏感性。GPAC基准测试软件可在最新的G语言基因组分析环境包中免费获取,网址为http://www.g-language.org/。

相似文献

1
GPAC: benchmarking the sensitivity of genome informatics analysis to genome annotation completeness.GPAC:评估基因组信息学分析对基因组注释完整性的敏感性
In Silico Biol. 2006;6(1-2):49-60.
2
A procedure for assessing GO annotation consistency.一种评估基因本体(GO)注释一致性的程序。
Bioinformatics. 2005 Jun;21 Suppl 1:i136-43. doi: 10.1093/bioinformatics/bti1019.
3
The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis.博洛尼亚注释资源:一种基于大规模比较基因组分析的蛋白质序列功能和结构注释的非分层方法。
J Proteome Res. 2009 Sep;8(9):4362-71. doi: 10.1021/pr900204r.
4
The PEDANT genome database in 2005.2005年的PEDANT基因组数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D308-10. doi: 10.1093/nar/gki019.
5
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.JUICE:一个数据管理系统,可在EST项目工作流程中促进对大量信息的分析。
BMC Bioinformatics. 2006 Nov 23;7:513. doi: 10.1186/1471-2105-7-513.
6
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.PhyloGena——一个用于对未知序列进行自动系统发育注释的用户友好型系统。
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
7
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
8
BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments.BABELOMICS:基因组规模实验功能注释中的系统生物学视角。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W472-6. doi: 10.1093/nar/gkl172.
9
MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data.MADAP,一种用于解释一维基因组注释数据的灵活聚类工具。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W201-5. doi: 10.1093/nar/gkm343. Epub 2007 May 25.
10
Calculation of reliable transcript levels of annotated genes on the basis of multiple probe-sets in Affymetrix microarrays.基于Affymetrix微阵列中多个探针集计算注释基因的可靠转录水平。
Acta Biochim Pol. 2009;56(2):271-7. Epub 2009 May 12.

引用本文的文献

1
An assessment of genome annotation coverage across the bacterial tree of life.评估细菌生命之树的基因组注释覆盖率。
Microb Genom. 2020 Mar;6(3). doi: 10.1099/mgen.0.000341.
2
Restauro-G: a rapid genome re-annotation system for comparative genomics.Restauro-G:一种用于比较基因组学的快速基因组重新注释系统。
Genomics Proteomics Bioinformatics. 2007 Feb;5(1):53-8. doi: 10.1016/S1672-0229(07)60014-X.