Friedberg Iddo, Radivojac Predrag
Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.
Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA.
Methods Mol Biol. 2017;1446:133-146. doi: 10.1007/978-1-4939-3743-1_10.
A biological experiment is the most reliable way of assigning function to a protein. However, in the era of high-throughput sequencing, scientists are unable to carry out experiments to determine the function of every single gene product. Therefore, to gain insights into the activity of these molecules and guide experiments, we must rely on computational means to functionally annotate the majority of sequence data. To understand how well these algorithms perform, we have established a challenge involving a broad scientific community in which we evaluate different annotation methods according to their ability to predict the associations between previously unannotated protein sequences and Gene Ontology terms. Here we discuss the rationale, benefits, and issues associated with evaluating computational methods in an ongoing community-wide challenge.
生物学实验是确定蛋白质功能最可靠的方法。然而,在高通量测序时代,科学家无法对每一个基因产物进行功能实验测定。因此,为了深入了解这些分子的活性并指导实验,我们必须依靠计算方法对大部分序列数据进行功能注释。为了了解这些算法的性能,我们发起了一项面向广大科学界的挑战,根据不同注释方法预测先前未注释的蛋白质序列与基因本体术语之间关联的能力来评估它们。在此,我们讨论在一个持续进行的全社区范围内的挑战中评估计算方法所涉及的基本原理、益处及相关问题。