Bioinformatic Analysis Group (GABi), Centro de Investigación y Desarrollo en Biotecnología (CIDBIO), Bogotá D.C., Colombia, USA.
OMICS. 2009 Dec;13(6):527-35. doi: 10.1089/omi.2009.0075.
Despite the great effort to design efficient systems allowing the electronic indexation of information concerning genes, proteins, structures, and interactions published daily in scientific journals, some problems are still observed in specific tasks such as functional annotation. The annotation of function is a critical issue for bioinformatic routines, such as for instance, in functional genomics and the further prediction of unknown protein function, which are highly dependent of the quality of existing annotations. Some information management systems evolve to efficiently incorporate information from large-scale projects, but often, annotation of single records from the literature is difficult and slow. In this short report, functional characterizations of a representative sample of the entire set of uncharacterized proteins from Escherichia coli K12 was compiled from Swiss-Prot, PubMed, and EcoCyc and demonstrate a functional annotation deficit in biological databases. Some issues are postulated as causes of the lack of annotation, and different solutions are evaluated and proposed to avoid them. The hope is that as a consequence of these observations, there will be new impetus to improve the speed and quality of functional annotation and ultimately provide updated, reliable information to the scientific community.
尽管人们付出了巨大的努力来设计高效的系统,以便对每日在科学期刊上发表的有关基因、蛋白质、结构和相互作用的信息进行电子索引,但在某些特定任务中仍存在一些问题,例如功能注释。功能注释是生物信息学常规操作的关键问题,例如功能基因组学和进一步预测未知蛋白质功能,这些都高度依赖于现有注释的质量。一些信息管理系统不断发展,以有效地纳入来自大规模项目的信息,但通常情况下,从文献中注释单个记录既困难又缓慢。在这份简短的报告中,从 Swiss-Prot、PubMed 和 EcoCyc 中编译了大肠杆菌 K12 中整个未鉴定蛋白质的代表性样本的功能特征,这表明生物数据库中的功能注释存在不足。提出了一些导致注释缺失的原因,并对不同的解决方案进行了评估和提出,以避免这些原因。希望这些观察结果能够为提高功能注释的速度和质量提供新的动力,并最终为科学界提供更新、可靠的信息。