Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France.
Sorbonne Université, CNRS, Institut des Sciences, du Calcul et des Données (ISCD), 75005 Paris, France.
PLoS Comput Biol. 2020 Feb 3;16(2):e1007624. doi: 10.1371/journal.pcbi.1007624. eCollection 2020 Feb.
Interactions between proteins and nucleic acids are at the heart of many essential biological processes. Despite increasing structural information about how these interactions may take place, our understanding of the usage made of protein surfaces by nucleic acids is still very limited. This is in part due to the inherent complexity associated to protein surface deformability and evolution. In this work, we present a method that contributes to decipher such complexity by predicting protein-DNA interfaces and characterizing their properties. It relies on three biologically and physically meaningful descriptors, namely evolutionary conservation, physico-chemical properties and surface geometry. We carefully assessed its performance on several hundreds of protein structures and compared it to several machine-learning state-of-the-art methods. Our approach achieves a higher sensitivity compared to the other methods, with a similar precision. Importantly, we show that it is able to unravel 'hidden' binding sites by applying it to unbound protein structures and to proteins binding to DNA via multiple sites and in different conformations. It is also applicable to the detection of RNA-binding sites, without significant loss of performance. This confirms that DNA and RNA-binding sites share similar properties. Our method is implemented as a fully automated tool, [Formula: see text], freely accessible at: http://www.lcqb.upmc.fr/JET2DNA. We also provide a new dataset of 187 protein-DNA complex structures, along with a subset of 82 associated unbound structures. The set represents the largest body of high-resolution crystallographic structures of protein-DNA complexes, use biological protein assemblies as DNA-binding units, and covers all major types of protein-DNA interactions. It is available at: http://www.lcqb.upmc.fr/PDNAbenchmarks.
蛋白质与核酸之间的相互作用是许多重要生物过程的核心。尽管关于这些相互作用如何发生的结构信息不断增加,但我们对核酸利用蛋白质表面的方式的理解仍然非常有限。这部分是由于与蛋白质表面可变形性和进化相关的固有复杂性。在这项工作中,我们提出了一种方法,通过预测蛋白质-DNA 界面并表征其性质来帮助破译这种复杂性。它依赖于三个具有生物学和物理意义的描述符,即进化保守性、物理化学性质和表面几何形状。我们仔细评估了它在数百个蛋白质结构上的性能,并将其与几种机器学习最新方法进行了比较。与其他方法相比,我们的方法具有更高的敏感性,且精度相似。重要的是,我们通过将其应用于未结合的蛋白质结构以及通过多个位点和不同构象与 DNA 结合的蛋白质,证明了它能够揭示“隐藏”的结合位点。它也适用于 RNA 结合位点的检测,而性能没有明显下降。这证实了 DNA 和 RNA 结合位点具有相似的性质。我们的方法实现为一个全自动工具,[公式:见文本],可在以下网址免费获得:http://www.lcqb.upmc.fr/JET2DNA。我们还提供了一个新的 187 个蛋白质-DNA 复合物结构数据集,以及 82 个相关未结合结构的子集。该数据集代表了最大的高分辨率晶体结构蛋白质-DNA 复合物集合,使用生物学蛋白质组装作为 DNA 结合单元,并涵盖了所有主要类型的蛋白质-DNA 相互作用。它可在以下网址获得:http://www.lcqb.upmc.fr/PDNAbenchmarks。