Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
J Mol Biol. 2021 May 14;433(10):166944. doi: 10.1016/j.jmb.2021.166944. Epub 2021 Mar 16.
Genome-wide protein-protein interaction (PPI) determination remains a significant unsolved problem in structural biology. The difficulty is twofold since high-throughput experiments (HTEs) have often a relatively high false-positive rate in assigning PPIs, and PPI quaternary structures are more difficult to solve than tertiary structures using traditional structural biology techniques. We proposed a uniform pipeline, Threpp, to address both problems. Starting from a pair of monomer sequences, Threpp first threads both sequences through a complex structure library, where the alignment score is combined with HTE data using a naïve Bayesian classifier model to predict the likelihood of two chains to interact with each other. Next, quaternary complex structures of the identified PPIs are constructed by reassembling monomeric alignments with dimeric threading frameworks through interface-specific structural alignments. The pipeline was applied to the Escherichia coli genome and created 35,125 confident PPIs which is 4.5-fold higher than HTE alone. Graphic analyses of the PPI networks show a scale-free cluster size distribution, consistent with previous studies, which was found critical to the robustness of genome evolution and the centrality of functionally important proteins that are essential to E. coli survival. Furthermore, complex structure models were constructed for all predicted E. coli PPIs based on the quaternary threading alignments, where 6771 of them were found to have a high confidence score that corresponds to the correct fold of the complexes with a TM-score >0.5, and 39 showed a close consistency with the later released experimental structures with an average TM-score = 0.73. These results demonstrated the significant usefulness of threading-based homologous modeling in both genome-wide PPI network detection and complex structural construction.
全基因组蛋白质-蛋白质相互作用(PPI)的测定仍然是结构生物学中一个未解决的重大问题。这是一个双重难题,因为高通量实验(HTE)在分配 PPI 时往往具有相对较高的假阳性率,并且使用传统的结构生物学技术,PPI 的四级结构比三级结构更难解决。我们提出了一个统一的管道 Threpp 来解决这两个问题。从一对单体序列开始,Threpp 首先将两个序列穿过一个复杂的结构库,其中对齐分数与 HTE 数据结合使用朴素贝叶斯分类器模型来预测两个链相互作用的可能性。接下来,通过接口特定的结构比对,将鉴定的 PPI 的单体对齐重新组装成二聚体对接框架,构建四级复合物结构。该管道应用于大肠杆菌基因组,创建了 35125 个置信度高的 PPI,比 HTE 单独使用高出 4.5 倍。PPI 网络的图形分析显示了无标度聚类大小分布,与先前的研究一致,这对于基因组进化的稳健性以及对大肠杆菌生存至关重要的功能重要蛋白质的中心性至关重要。此外,根据四级对接对齐构建了所有预测的大肠杆菌 PPI 的复杂结构模型,其中 6771 个具有高置信度评分,对应于复合物的正确折叠,TM 分数> 0.5,39 个与后来发布的实验结构具有平均 TM 分数= 0.73 的高度一致性。这些结果证明了基于同源建模的对接在全基因组 PPI 网络检测和复杂结构构建中具有重要的实用性。