Department of Computer Science, Ozyegin University, Istanbul, Turkey.
Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
J Comput Biol. 2021 May;28(5):514-525. doi: 10.1089/cmb.2020.0425. Epub 2020 Dec 23.
Computational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, the task is modeled as multilabel classification problems where the labels (functions) are treated independently or semi-independently. However, databases such as the Gene Ontology provide information about the similarities between functions. We explore the use of the Metric Labeling combinatorial optimization problem to make use of heuristically computed distances between functions to make more accurate predictions of protein function in networks derived from both physical interactions and a combination of other data types. To do this, we give a new technique (based on convex optimization) for converting heuristic semimetric distances into a metric with minimum least-squared distortion (LSD). The Metric Labeling approach is shown to outperform five existing techniques for inferring function from networks. These results suggest that Metric Labeling is useful for protein function prediction, and that LSD minimization can help solve the problem of converting heuristic distances to a metric.
计算技术在从关系数据(功能或物理相互作用)预测蛋白质功能方面取得了成功。这些技术已被用于生成假设并指导实验验证。除了少数例外,任务被建模为多标签分类问题,其中标签(功能)被独立或半独立地对待。然而,像基因本体论这样的数据库提供了关于功能之间相似性的信息。我们探索使用度量标记组合优化问题来利用启发式计算的功能之间的距离,以便更准确地预测来自物理相互作用和其他数据类型组合的网络中的蛋白质功能。为此,我们提出了一种新的技术(基于凸优化),用于将启发式半度量距离转换为具有最小最小二乘失真(LSD)的度量。度量标记方法在从网络推断功能方面优于五种现有技术。这些结果表明,度量标记对于蛋白质功能预测很有用,并且 LSD 最小化可以帮助解决将启发式距离转换为度量的问题。