Tetko Igor V, Brauner Barbara, Dunger-Kaltenbach Irmtraud, Frishman Goar, Montrone Corinna, Fobo Gisela, Ruepp Andreas, Antonov Alexey V, Surmeli Dimitrij, Mewes Hans-Wernen
Institute for Bioinformatics (MIPS), GSF National Research Center for Environment and Health, Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany.
Bioinformatics. 2005 May 15;21(10):2520-1. doi: 10.1093/bioinformatics/bti380. Epub 2005 Mar 15.
Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult.
The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation.
BFAB is available at http://mips.gsf.de/proj/bfab
任何根据蛋白质序列自动进行功能注释的新方法的开发都需要高质量的数据(作为基准)以及繁琐的准备工作,以生成作为机器学习方法输入数据所需的序列参数。不同的程序设置和不兼容的协议使得对所分析方法的比较变得困难。
MIPS细菌功能注释基准数据集(MIPS-BFAB)是一个新的高质量资源,包含根据MIPS功能目录(FunCat)手动注释的四个细菌基因组。这些资源包括预先计算的序列参数,如序列相似性得分、InterPro结构域组成和其他可用于开发和基准测试细菌蛋白质序列功能注释方法的参数。这些数据以XML格式提供,可供不一定是基因组注释专家的科学家使用。