Townsley Thomas D, Wilson James T, Akers Harrison, Bryant Timothy, Cordova Salvador, Wallace T L, Durston Kirk K, Deweese Joseph E
Department of Computational Sciences, College of Computing & Technology, Lipscomb University, Nashville, TN 37204, USA.
Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, Lipscomb University, Nashville, TN 37204, USA.
Bioinform Adv. 2022 Aug 18;2(1):vbac058. doi: 10.1093/bioadv/vbac058. eCollection 2022.
AlphaFold has been a major advance in predicting protein structure, but still leaves the problem of determining which sub-molecular components of a protein are essential for it to carry out its function within the cell. Direct coupling analysis predicts two- and three-amino acid contacts, but there may be essential interdependencies that are not proximal within the 3D structure. The problem to be addressed is to design a computational method that locates and ranks essential non-proximal interdependencies within a protein involving five or more amino acids, using large, multiple sequence alignments (MSAs) for both globular and intrinsically unstructured proteins.
We developed PSICalc (Protein Subdomain Interdependency Calculator), a laptop-friendly, pattern-discovery, bioinformatics software tool that analyzes large MSAs for both structured and unstructured proteins, locates both proximal and non-proximal inter-dependent sites, and clusters them into pairwise (second order), third-order and higher-order clusters using a k-modes approach, and provides ranked results within minutes. To aid in visualizing these interdependencies, we developed a graphical user interface that displays these subdomain relationships as a polytree graph. To demonstrate, we provide examples of both proximal and non-proximal interdependencies documented for eukaryotic topoisomerase II including between the unstructured C-terminal domain and the N-terminal domain.
https://github.com/jdeweeselab/psicalc-package.
Supplementary data are available at online.
AlphaFold在预测蛋白质结构方面取得了重大进展,但仍存在确定蛋白质的哪些亚分子成分对其在细胞内发挥功能至关重要的问题。直接耦合分析可预测两个和三个氨基酸之间的接触,但可能存在在三维结构中不相邻的关键相互依赖性。要解决的问题是设计一种计算方法,该方法使用针对球状蛋白和内在无序蛋白的大型多序列比对(MSA),来定位和排列蛋白质中涉及五个或更多氨基酸的关键非相邻相互依赖性。
我们开发了PSICalc(蛋白质亚结构域相互依赖性计算器),这是一种便于在笔记本电脑上使用的、基于模式发现的生物信息学软件工具,可分析结构化和非结构化蛋白质的大型MSA,定位相邻和非相邻的相互依赖位点,并使用k-模式方法将它们聚类为成对(二阶)、三阶和更高阶聚类,并在几分钟内提供排名结果。为了帮助直观显示这些相互依赖性,我们开发了一个图形用户界面,将这些亚结构域关系显示为多树图。为了进行演示,我们提供了真核拓扑异构酶II中记录的相邻和非相邻相互依赖性的示例,包括在非结构化的C末端结构域和N末端结构域之间的相互依赖性。
https://github.com/jdeweeselab/psicalc-package。
补充数据可在网上获取。