Arakawa Kazuharu, Nakayama Yoichi, Tomita Masaru
Institute for Advanced Biosciences, Keio University, Fujisawa, 252-8520 Kanagawa, Japan.
In Silico Biol. 2006;6(1-2):49-60.
In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered "sensitive" to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC bendhmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/.
鉴于近期基因组序列数据的爆炸式增长,以及目前已有200多个完整的基因组序列,基因组规模的生物信息学分析的重要性正在迅速增加。然而,计算基因组信息学分析往往缺乏对其功能注释完整性敏感性的统计评估。因此,一种用于自动验证计算基因组分析对基因组注释完整性敏感性的预分析方法对此很有用。在本报告中,我们开发了基因预测准确性分类(GPAC)测试,该测试通过对五个不同基因组(根据注释准确性水平分类)以及与五个分类组中每个组基因数量相同的随机抽样基因组重复相同分析,提供敏感性的统计证据。然后评估这些结果的变异性,如果结果随不同数据子集有显著差异,则该分析被认为对注释完整性“敏感”,并建议在实际的计算机分析之前仔细选择数据。GPAC测试已应用于Sakai等人(2001年)和Ohno等人(2001年)的分析,结果表明Ohno等人的分析对注释完整性更敏感。这表明GPAC可用于确定分析的敏感性。GPAC基准测试软件可在最新的G语言基因组分析环境包中免费获取,网址为http://www.g-language.org/。