Suppr超能文献

使用图上的扩散核来分配蛋白质中的结构域。

Assignment of structural domains in proteins using diffusion kernels on graphs.

机构信息

Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.

Department of Biophysics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.

出版信息

BMC Bioinformatics. 2022 Sep 8;23(1):369. doi: 10.1186/s12859-022-04902-9.

Abstract

Though proposing algorithmic approaches for protein domain decomposition has been of high interest, the inherent ambiguity to the problem makes it still an active area of research. Besides, accurate automated methods are in high demand as the number of solved structures for complex proteins is on the rise. While majority of the previous efforts for decomposition of 3D structures are centered on the developing clustering algorithms, employing enhanced measures of proximity between the amino acids has remained rather uncharted. If there exists a kernel function that in its reproducing kernel Hilbert space, structural domains of proteins become well separated, then protein structures can be parsed into domains without the need to use a complex clustering algorithm. Inspired by this idea, we developed a protein domain decomposition method based on diffusion kernels on protein graphs. We examined all combinations of four graph node kernels and two clustering algorithms to investigate their capability to decompose protein structures. The proposed method is tested on five of the most commonly used benchmark datasets for protein domain assignment plus a comprehensive non-redundant dataset. The results show a competitive performance of the method utilizing one of the diffusion kernels compared to four of the best automatic methods. Our method is also able to offer alternative partitionings for the same structure which is in line with the subjective definition of protein domain. With a competitive accuracy and balanced performance for the simple and complex structures despite relying on a relatively naive criterion to choose optimal decomposition, the proposed method revealed that diffusion kernels on graphs in particular, and kernel functions in general are promising measures to facilitate parsing proteins into domains and performing different structural analysis on proteins. The size and interconnectedness of the protein graphs make them promising targets for diffusion kernels as measures of affinity between amino acids. The versatility of our method allows the implementation of future kernels with higher performance. The source code of the proposed method is accessible at https://github.com/taherimo/kludo . Also, the proposed method is available as a web application from https://cbph.ir/tools/kludo .

摘要

虽然提出蛋白质结构域分解的算法方法一直受到高度关注,但该问题固有的模糊性使其仍然是一个活跃的研究领域。此外,由于复杂蛋白质的已解决结构数量不断增加,因此对准确的自动化方法的需求也很高。虽然以前大多数用于 3D 结构分解的工作都集中在开发聚类算法上,但氨基酸之间的接近度的增强度量仍然是未知的。如果存在一个核函数,其在其再生核希尔伯特空间中,蛋白质的结构域变得很好地分离,那么蛋白质结构可以被解析为域,而无需使用复杂的聚类算法。受此启发,我们基于蛋白质图上的扩散核开发了一种蛋白质结构域分解方法。我们研究了四种图节点核和两种聚类算法的所有组合,以研究它们分解蛋白质结构的能力。该方法在五个最常用的蛋白质结构域分配基准数据集和一个全面的非冗余数据集上进行了测试。结果表明,与四种最佳自动方法中的一种相比,利用一种扩散核的方法具有竞争力。该方法还能够为同一结构提供替代分区,这与蛋白质结构域的主观定义一致。尽管依赖于相对简单的标准来选择最佳分解,但该方法的准确性和性能都很有竞争力,适用于简单和复杂的结构,表明图上的扩散核,特别是核函数是一种很有前途的方法,可以促进将蛋白质解析为域,并对蛋白质进行不同的结构分析。蛋白质图的大小和互连性使它们成为扩散核作为氨基酸亲和力度量的有前途的目标。该方法的多功能性允许实现具有更高性能的未来核。该方法的源代码可在 https://github.com/taherimo/kludo 上获得。此外,该方法还可以通过 https://cbph.ir/tools/kludo 作为网络应用程序使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d7/9461149/85739c436d6e/12859_2022_4902_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验