Cozzetto Domenico, Jones David T
Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
Methods Mol Biol. 2017;1446:55-67. doi: 10.1007/978-1-4939-3743-1_5.
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
对公共序列资源的调查表明,对于相当一部分已知蛋白质,实验支持的功能信息仍然完全缺失,而对于更大比例的蛋白质,此类信息显然并不完整。长期以来,生物信息学方法一直单独或组合使用非常多样的数据源来预测蛋白质功能,因为人们明白不同的数据类型有助于阐明互补的生物学作用。本章重点介绍以氨基酸序列作为输入并直接生成基因本体(GO)术语注释作为输出的方法;同时还介绍了相关的生物学和计算概念,以及各个方法的优缺点。