Suppr超能文献

蛋白质标注预测的度量标记和半度量嵌入。

Metric Labeling and Semimetric Embedding for Protein Annotation Prediction.

机构信息

Department of Computer Science, Ozyegin University, Istanbul, Turkey.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

出版信息

J Comput Biol. 2021 May;28(5):514-525. doi: 10.1089/cmb.2020.0425. Epub 2020 Dec 23.

Abstract

Computational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, the task is modeled as multilabel classification problems where the labels (functions) are treated independently or semi-independently. However, databases such as the Gene Ontology provide information about the similarities between functions. We explore the use of the Metric Labeling combinatorial optimization problem to make use of heuristically computed distances between functions to make more accurate predictions of protein function in networks derived from both physical interactions and a combination of other data types. To do this, we give a new technique (based on convex optimization) for converting heuristic semimetric distances into a metric with minimum least-squared distortion (LSD). The Metric Labeling approach is shown to outperform five existing techniques for inferring function from networks. These results suggest that Metric Labeling is useful for protein function prediction, and that LSD minimization can help solve the problem of converting heuristic distances to a metric.

摘要

计算技术在从关系数据(功能或物理相互作用)预测蛋白质功能方面取得了成功。这些技术已被用于生成假设并指导实验验证。除了少数例外,任务被建模为多标签分类问题,其中标签(功能)被独立或半独立地对待。然而,像基因本体论这样的数据库提供了关于功能之间相似性的信息。我们探索使用度量标记组合优化问题来利用启发式计算的功能之间的距离,以便更准确地预测来自物理相互作用和其他数据类型组合的网络中的蛋白质功能。为此,我们提出了一种新的技术(基于凸优化),用于将启发式半度量距离转换为具有最小最小二乘失真(LSD)的度量。度量标记方法在从网络推断功能方面优于五种现有技术。这些结果表明,度量标记对于蛋白质功能预测很有用,并且 LSD 最小化可以帮助解决将启发式距离转换为度量的问题。

相似文献

3
Integrating multiple networks for protein function prediction.整合多个网络用于蛋白质功能预测。
BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S3. doi: 10.1186/1752-0509-9-S1-S3. Epub 2015 Jan 21.
4

引用本文的文献

1
Diffusion archeology for diffusion progression history reconstruction.用于扩散进程历史重建的扩散考古学。
Knowl Inf Syst. 2016 Nov;49(2):403-427. doi: 10.1007/s10115-015-0904-x. Epub 2015 Dec 11.

本文引用的文献

1
The BioGRID interaction database: 2019 update.生物相互作用数据库(BioGRID):2019 年更新版。
Nucleic Acids Res. 2019 Jan 8;47(D1):D529-D541. doi: 10.1093/nar/gky1079.
4
Approximate labeling via graph cuts based on linear programming.基于线性规划的通过图割进行近似标记
IEEE Trans Pattern Anal Mach Intell. 2007 Aug;29(8):1436-53. doi: 10.1109/TPAMI.2007.1061.
5
Network-based prediction of protein function.基于网络的蛋白质功能预测。
Mol Syst Biol. 2007;3:88. doi: 10.1038/msb4100129. Epub 2007 Mar 13.
8
Hierarchical multi-label prediction of gene function.基因功能的分层多标签预测
Bioinformatics. 2006 Apr 1;22(7):830-6. doi: 10.1093/bioinformatics/btk048. Epub 2006 Jan 12.
10
A knowledge-based clustering algorithm driven by Gene Ontology.
J Biopharm Stat. 2004 Aug;14(3):687-700. doi: 10.1081/bip-200025659.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验