Hawkins Troy, Kihara Daisuke
Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
J Bioinform Comput Biol. 2007 Feb;5(1):1-30. doi: 10.1142/s0219720007002503.
Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.
基因组计划所产生的未表征蛋白质序列的功能预测已成为计算生物学的一个重要研究重点。我们对传统序列相似性之外的几种方法进行了分类,这些方法利用大量可用数据进行计算功能预测,包括基于结构、关联(基因组背景)、相互作用(细胞背景)、过程(代谢背景)和蛋白质组学实验的方法。由于它们纳入了基于序列的方法中未使用的结构和实验数据,因此可以为蛋白质功能预测提供更高的准确性和可靠性。在此,我们首先回顾蛋白质功能的定义。然后介绍这些方法的最新进展,并特别关注可进行的预测类型。强调了进一步发展综合系统生物学技术的必要性,这些技术能够利用基因组学和蛋白质组学领域不断增加的数据。为方便读者,还列出了各类有用在线资源的表格。此外,还讨论了计算科学家在生物研究近期未来中的作用以及计算生物学与实验生物学之间的相互作用。