Department of chemistry and Laser Chemistry Institute, Fudan University, Shanghai, 200433, P.R. China.
Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, 76019, USA.
Sci Rep. 2017 Nov 2;7(1):14916. doi: 10.1038/s41598-017-14877-w.
The key finding in the DNA double helix model is the specific pairing or binding between nucleotides A-T and C-G, and the pairing rules are the molecule basis of genetic code. Unfortunately, no such rules have been discovered for proteins. Here we show that intrinsic sequence patterns between intra-protein binding peptide fragments exist, they can be extracted using a deep learning algorithm, and they bear an interesting semblance to the DNA double helix model. The intra-protein binding peptide fragments have specific and intrinsic sequence patterns, distinct from non-binding peptide fragments, and multi-millions of binding and non-binding peptide fragments from currently available protein X-ray structures are classified with an accuracy of up to 93%. The specific binding between short peptide fragments may provide an important driving force for protein folding and protein-protein interaction, two open and fundamental problems in molecular biology, and it may have significant potential in design, discovery, and development of peptide, protein, and antibody drugs.
DNA 双螺旋模型的关键发现是核苷酸 A-T 和 C-G 之间的特定配对或结合,而配对规则是遗传密码的分子基础。不幸的是,尚未发现蛋白质的此类规则。在这里,我们表明,在蛋白质内结合肽片段之间存在内在序列模式,它们可以使用深度学习算法提取,并且与 DNA 双螺旋模型具有有趣的相似之处。蛋白质内结合肽片段具有独特的内在序列模式,与非结合肽片段不同,并且从目前可用的蛋白质 X 射线结构中分类了数百万个结合和非结合肽片段,准确率高达 93%。短肽片段之间的特异性结合可能为蛋白质折叠和蛋白质-蛋白质相互作用这两个分子生物学中的开放性和基本问题提供重要驱动力,并且在肽、蛋白质和抗体药物的设计、发现和开发方面具有重要潜力。