Suppr超能文献

用于预测蛋白质结构中功能残基的图let核

Graphlet kernels for prediction of functional residues in protein structures.

作者信息

Vacic Vladimir, Iakoucheva Lilia M, Lonardi Stefano, Radivojac Predrag

机构信息

Department of Computer Science and Engineering, University of California, Riverside, California, USA.

出版信息

J Comput Biol. 2010 Jan;17(1):55-72. doi: 10.1089/cmb.2009.0029.

Abstract

We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non-isomorphic subgraphs (graphlets), centered on the vertex of interest. A similarity measure between two vertices is expressed as the inner product of their respective count vectors and is used in a supervised learning framework to classify protein residues. We evaluated our method on two function prediction problems: identification of catalytic residues in proteins, which is a well-studied problem suitable for benchmarking, and a much less explored problem of predicting phosphorylation sites in protein structures. The performance of the graphlet kernel approach was then compared against two alternative methods, a sequence-based predictor and our implementation of the FEATURE framework. On both tasks, the graphlet kernel performed favorably; however, the margin of difference was considerably higher on the problem of phosphorylation site prediction. While there is data that phosphorylation sites are preferentially positioned in intrinsically disordered regions, we provide evidence that for the sites that are located in structured regions, neither the surface accessibility alone nor the averaged measures calculated from the residue microenvironments utilized by FEATURE were sufficient to achieve high accuracy. The key benefit of the graphlet representation is its ability to capture neighborhood similarities in protein structures via enumerating the patterns of local connectivity in the corresponding labeled graphs.

摘要

我们介绍了一种用于注释蛋白质结构中功能残基的基于图的新型核方法。首先将结构建模为蛋白质接触图,其中节点对应于残基,边连接空间上相邻的残基。然后,图中的每个顶点都表示为以感兴趣的顶点为中心的标记非同构子图(图元)计数向量。两个顶点之间的相似性度量表示为它们各自计数向量的内积,并用于监督学习框架中对蛋白质残基进行分类。我们在两个功能预测问题上评估了我们的方法:蛋白质中催化残基的识别,这是一个经过充分研究且适合作为基准的问题,以及预测蛋白质结构中磷酸化位点这个研究较少的问题。然后将图元核方法的性能与另外两种方法进行比较,一种基于序列的预测器和我们实现的FEATURE框架。在这两个任务上,图元核都表现良好;然而,在磷酸化位点预测问题上差异幅度要大得多。虽然有数据表明磷酸化位点优先位于内在无序区域,但我们提供的证据表明,对于位于结构化区域的位点,仅表面可及性或FEATURE所使用的从残基微环境计算出的平均度量都不足以实现高精度。图元表示的关键优势在于它能够通过枚举相应标记图中的局部连通性模式来捕捉蛋白质结构中的邻域相似性。

相似文献

2
Stochastic Graphlet Embedding.随机图元嵌入
IEEE Trans Neural Netw Learn Syst. 2019 Aug;30(8):2369-2382. doi: 10.1109/TNNLS.2018.2884700. Epub 2018 Dec 24.
3
6
Biological network comparison using graphlet degree distribution.使用图let度分布进行生物网络比较。
Bioinformatics. 2007 Jan 15;23(2):e177-83. doi: 10.1093/bioinformatics/btl301.
10
Atom environment kernels on molecules.原子环境核在分子上。
J Chem Inf Model. 2014 May 27;54(5):1289-300. doi: 10.1021/ci400403w. Epub 2014 May 6.

引用本文的文献

1
Simplicity within biological complexity.生物复杂性中的简单性。
Bioinform Adv. 2025 Feb 6;5(1):vbae164. doi: 10.1093/bioadv/vbae164. eCollection 2025.
2
Current and future directions in network biology.网络生物学的当前与未来发展方向。
Bioinform Adv. 2024 Aug 14;4(1):vbae099. doi: 10.1093/bioadv/vbae099. eCollection 2024.
5
Classification in biological networks with hypergraphlet kernels.基于超图节点核的生物网络分类。
Bioinformatics. 2021 May 17;37(7):1000-1007. doi: 10.1093/bioinformatics/btaa768.
6
Network-based protein structural classification.基于网络的蛋白质结构分类。
R Soc Open Sci. 2020 Jun 3;7(6):191461. doi: 10.1098/rsos.191461. eCollection 2020 Jun.
7
Network analysis of synonymous codon usage.同义密码子使用的网络分析。
Bioinformatics. 2020 Dec 8;36(19):4876-4884. doi: 10.1093/bioinformatics/btaa603.
9
IncGraph: Incremental graphlet counting for topology optimisation.IncGraph:用于拓扑优化的增量图元计数。
PLoS One. 2018 Apr 26;13(4):e0195997. doi: 10.1371/journal.pone.0195997. eCollection 2018.

本文引用的文献

10
Predicting protein function from sequence and structure.从序列和结构预测蛋白质功能。
Nat Rev Mol Cell Biol. 2007 Dec;8(12):995-1005. doi: 10.1038/nrm2281.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验