Baumgartner William A, Cohen K Bretonnel, Fox Lynne M, Acquaah-Mensah George, Hunter Lawrence
Center for Computational Pharmacology, University of Colorado School of Medicine, USA.
Bioinformatics. 2007 Jul 1;23(13):i41-8. doi: 10.1093/bioinformatics/btm229.
Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents.
Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes.
在计算生物学的发展过程中,知识库构建一直是一个活跃且极为重要的领域。然而,无论是关于知识库内容的评估,还是关于其构建过程的评估,相关工作几乎没有历史记录。本文提出将软件工程中一种称为发现/修复图的度量方法应用于评估基因组知识库的构建过程及其内容的完整性问题。
在两个大型公开可用的知识库中发现了发现/修复图中易于理解的变化模式。这些模式表明,当前的人工编目过程即使要完成对最重要的模式生物的注释也将花费太长时间,而且按照它们目前的生产速度,永远不足以完成对所有现有蛋白质组的注释。