Konagurthu Arun S, Subramanian Ramanan, Allison Lloyd, Abramson David, Stuckey Peter J, Garcia de la Banda Maria, Lesk Arthur M
Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia.
Research Computing Center, University of Queensland, Brisbane, QLD, Australia.
Front Mol Biosci. 2021 Apr 30;7:612920. doi: 10.3389/fmolb.2020.612920. eCollection 2020.
What is the architectural "basis set" of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures-called -typically at a subdomain level, based on an unbiased subset of known protein structures. Each represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence-structure correlations, useful for structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
蛋白质结构观测宇宙的架构“基集”是什么?运用信息论推理,我们基于已知蛋白质结构的无偏子集,用一个包含1493个亚结构(通常在亚结构域水平)的字典回答了这个问题。每个亚结构代表相互接触的螺旋和链的拓扑保守组合。任何蛋白质结构都可以分解为由这个字典中的概念实例构成。我们剖析了蛋白质数据库并完整编目了所有概念实例。这带来了许多见解,包括概念与催化活性或结合位点之间的相关性,这对合理药物设计有用;局部氨基酸序列与结构的相关性,这对结构预测方法有用;以及支持识别和探索进化关系的信息,这对结构研究有用。一个交互式网站Proçodic,网址为http://lcb.infotech.monash.edu.au/prosodic (点击进入),提供对整个概念字典及其用法以及所有相关信息的访问和导航。本报告是一个持续项目的一部分,其目标是以赛勒斯·乔西亚工作的精神阐明蛋白质架构的基本原理。