Meinicke Peter
Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Germany.
BMC Genomics. 2009 Sep 2;10:409. doi: 10.1186/1471-2164-10-409.
Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families.
Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time.
For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.
功能分析是表征和比较整个基因组功能潜力的关键技术。根据序列到功能类别的分配来估计图谱是一项计算成本高昂的任务,因为它需要将基因组中的所有蛋白质序列与通常庞大的注释序列或序列家族数据库进行比较。
基于用于检测Pfam结构域的机器学习技术,用于超快速功能分析的UFO网络服务器使研究人员能够即时处理大型蛋白质序列集合。除了Pfam和GO类别的频率外,用户还能获得序列到Pfam结构域家族的特定分配。此外,与现有基因组的比较提供了相对于821个参考蛋白质组的差异分数。考虑到基础的UFO结构域检测,对206个测试基因组的结果表明该方法具有很高的灵敏度。与当前最先进的隐马尔可夫模型(HMM)相比,运行时测量显示速度有了显著提升,达到了四个数量级。对于平均大小的原核基因组,计算功能图谱及其比较通常需要大约10秒的处理时间。
UFO网络服务器首次使人们能够快速了解新测序生物体的功能清单。与大量预先计算的图谱进行基因组规模的比较,可以初步推测功能相关的生物体。该服务免费提供,无需用户注册或提供有效的电子邮件地址。