使用相干子图分析对蛋白质结构家族进行准确分类。

Accurate classification of protein structural families using coherent subgraph analysis.

作者信息

Huan J, Wang W, Washington A, Prins J, Shah R, Tropsha A

机构信息

Department of Computer Science, School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA.

出版信息

Pac Symp Biocomput. 2004:411-22. doi: 10.1142/9789812704856_0039.

DOI:10.1142/9789812704856_0039

PMID:14992521

Abstract

Protein structural annotation and classification is an important problem in bioinformatics. We report on the development of an efficient subgraph mining technique and its application to finding characteristic substructural patterns within protein structural families. In our method, protein structures are represented by graphs where the nodes are residues and the edges connect residues found within certain distance from each other. Application of subgraph mining to proteins is challenging for a number reasons: (1) protein graphs are large and complex, (2) current protein databases are large and continue to grow rapidly, and (3) only a small fraction of the frequent subgraphs among the huge pool of all possible subgraphs could be significant in the context of protein classification. To address these challenges, we have developed an information theoretic model called coherent subgraph mining. From information theory, the entropy of a random variable X measures the information content carried by X and the Mutual Information (MI) between two random variables X and Y measures the correlation between X and Y. We define a subgraph X as coherent if it is strongly correlated with every sufficiently large sub-subgraph Y embedded in it. Based on the MI metric, we have designed a search scheme that only reports coherent subgraphs. To determine the significance of coherent protein subgraphs, we have conducted an experimental study in which all coherent subgraphs were identified in several protein structural families annotated in the SCOP database (Murzin et al, 1995). The Support Vector Machine algorithm was used to classify proteins from different families under the binary classification scheme. We find that this approach identifies spatial motifs unique to individual SCOP families and affords excellent discrimination between families.

摘要

蛋白质结构注释与分类是生物信息学中的一个重要问题。我们报告了一种高效子图挖掘技术的开发及其在寻找蛋白质结构家族内特征子结构模式中的应用。在我们的方法中，蛋白质结构由图表示，其中节点是残基，边连接相互之间在一定距离内发现的残基。将子图挖掘应用于蛋白质存在诸多挑战：（1）蛋白质图庞大且复杂；（2）当前蛋白质数据库规模巨大且持续快速增长；（3）在所有可能子图的巨大集合中，只有一小部分频繁子图在蛋白质分类背景下可能具有重要意义。为应对这些挑战，我们开发了一种名为相干子图挖掘的信息论模型。从信息论角度来看，随机变量X的熵衡量X所携带的信息内容，两个随机变量X和Y之间的互信息（MI）衡量X和Y之间的相关性。如果子图X与嵌入其中的每个足够大的子子图Y高度相关，我们就将子图X定义为相干的。基于MI度量，我们设计了一种仅报告相干子图的搜索方案。为了确定相干蛋白质子图的重要性，我们进行了一项实验研究，在SCOP数据库（Murzin等人，1995年）中注释的几个蛋白质结构家族中识别出所有相干子图。使用支持向量机算法在二元分类方案下对来自不同家族的蛋白质进行分类。我们发现这种方法能够识别各个SCOP家族特有的空间基序，并在家族之间提供出色的区分能力。

相似文献

Accurate classification of protein structural families using coherent subgraph analysis.

Pac Symp Biocomput. 2004:411-22. doi: 10.1142/9789812704856_0039.

Comparing graph representations of protein structure for mining family-specific residue-based packing motifs.

J Comput Biol. 2005 Jul-Aug;12(6):657-71. doi: 10.1089/cmb.2005.12.657.

Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):7-16. doi: 10.1109/TCBB.2013.136.

Discovering interesting molecular substructures for molecular classification.

IEEE Trans Nanobioscience. 2010 Jun;9(2):77-89. doi: 10.1109/TNB.2010.2042609.

Distance-based identification of structure motifs in proteins using constrained frequent subgraph mining.

Comput Syst Bioinformatics Conf. 2006:227-38.

Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins Using Frequent Subgraph Mining.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1537-1549. doi: 10.1109/TCBB.2017.2756879. Epub 2017 Sep 26.

Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

J Comput Aided Mol Des. 2009 Nov;23(11):773-84. doi: 10.1007/s10822-009-9273-4. Epub 2009 Jun 20.

Discriminative Feature Selection for Uncertain Graph Classification.

Proc SIAM Int Conf Data Min. 2013;2013:82-93. doi: 10.1137/1.9781611972832.10.

A Two-Phase Algorithm for Differentially Private Frequent Subgraph Mining.

IEEE Trans Knowl Data Eng. 2018 Aug 1;30(8):1411-1425. doi: 10.1109/tkde.2018.2793862. Epub 2018 Jan 15.

Network subgraph-based approach for analyzing and comparing molecular networks.

PeerJ. 2022 May 3;10:e13137. doi: 10.7717/peerj.13137. eCollection 2022.

引用本文的文献

Neuronal Graphs: A Graph Theory Primer for Microscopic, Functional Networks of Neurons Recorded by Calcium Imaging.

Front Neural Circuits. 2021 Jun 10;15:662882. doi: 10.3389/fncir.2021.662882. eCollection 2021.

Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey.

IEEE Access. 2021;9:5497-5516. doi: 10.1109/ACCESS.2020.3047588. Epub 2020 Dec 25.

Characterizing the regularity of tetrahedral packing motifs in protein tertiary structure.

Bioinformatics. 2010 Dec 15;26(24):3059-66. doi: 10.1093/bioinformatics/btq573. Epub 2010 Nov 2.

Discrimination of thermophilic and mesophilic proteins.

BMC Struct Biol. 2010 May 17;10 Suppl 1(Suppl 1):S5. doi: 10.1186/1472-6807-10-S1-S5.

GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):197-207. doi: 10.1109/TCBB.2009.80.

GPM: A Graph Pattern Matching Kernel with Diffusion for Chemical Compound Classification.

Proc IEEE Int Symp Bioinformatics Bioeng. 2008 Dec 8;2008:1-6. doi: 10.1109/BIBE.2008.4696654.

STRALCP--structure alignment-based clustering of proteins.

Nucleic Acids Res. 2007;35(22):e150. doi: 10.1093/nar/gkm1049. Epub 2007 Nov 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用相干子图分析对蛋白质结构家族进行准确分类。

Accurate classification of protein structural families using coherent subgraph analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献