Key Lab of Intelligent Information Processing, Big Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
School of Computer Science, University of Chinese Academy of Sciences, Beijing, China.
BMC Bioinformatics. 2020 Nov 5;21(1):503. doi: 10.1186/s12859-020-03793-y.
The formation of contacts among protein secondary structure elements (SSEs) is an important step in protein folding as it determines topology of protein tertiary structure; hence, inferring inter-SSE contacts is crucial to protein structure prediction. One of the existing strategies infers inter-SSE contacts directly from the predicted possibilities of inter-residue contacts without any preprocessing, and thus suffers from the excessive noises existing in the predicted inter-residue contacts. Another strategy defines SSEs based on protein secondary structure prediction first, and then judges whether each candidate SSE pair could form contact or not. However, it is difficult to accurately determine boundary of SSEs due to the errors in secondary structure prediction. The incorrectly-deduced SSEs definitely hinder subsequent prediction of the contacts among them.
We here report an accurate approach to infer the inter-SSE contacts (thus called as ISSEC) using the deep object detection technique. The design of ISSEC is based on the observation that, in the inter-residue contact map, the contacting SSEs usually form rectangle regions with characteristic patterns. Therefore, ISSEC infers inter-SSE contacts through detecting such rectangle regions. Unlike the existing approach directly using the predicted probabilities of inter-residue contact, ISSEC applies the deep convolution technique to extract high-level features from the inter-residue contacts. More importantly, ISSEC does not rely on the pre-defined SSEs. Instead, ISSEC enumerates multiple candidate rectangle regions in the predicted inter-residue contact map, and for each region, ISSEC calculates a confidence score to measure whether it has characteristic patterns or not. ISSEC employs greedy strategy to select non-overlapping regions with high confidence score, and finally infers inter-SSE contacts according to these regions.
Comprehensive experimental results suggested that ISSEC outperformed the state-of-the-art approaches in predicting inter-SSE contacts. We further demonstrated the successful applications of ISSEC to improve prediction of both inter-residue contacts and tertiary structure as well.
蛋白质二级结构元件(SSE)之间的接触形成是蛋白质折叠的重要步骤,因为它决定了蛋白质三级结构的拓扑结构;因此,推断 SSE 之间的接触对于蛋白质结构预测至关重要。现有的策略之一是直接从预测的残基间接触的可能性中推断 SSE 之间的接触,而无需任何预处理,因此受到预测的残基间接触中存在的大量噪声的影响。另一种策略是首先基于蛋白质二级结构预测定义 SSE,然后判断每个候选 SSE 对是否可以形成接触。然而,由于二级结构预测的误差,准确确定 SSE 的边界是困难的。由于推断错误的 SSE 肯定会阻碍随后对它们之间的接触的预测。
我们在这里报告了一种使用深度目标检测技术推断 SSE 之间的接触(因此称为 ISSEC)的准确方法。ISSEC 的设计基于这样的观察,即在残基间接触图中,接触的 SSE 通常形成具有特征模式的矩形区域。因此,ISSEC 通过检测这样的矩形区域来推断 SSE 之间的接触。与直接使用预测的残基间接触概率的现有方法不同,ISSEC 应用深度卷积技术从残基间接触中提取高级特征。更重要的是,ISSEC 不依赖于预定义的 SSE。相反,ISSEC 在预测的残基间接触图中枚举多个候选矩形区域,对于每个区域,ISSEC 计算置信度得分来测量它是否具有特征模式。ISSEC 采用贪婪策略选择具有高置信度得分的非重叠区域,最后根据这些区域推断 SSE 之间的接触。
综合实验结果表明,ISSEC 在预测 SSE 之间的接触方面优于最先进的方法。我们进一步证明了 ISSEC 在改进残基间接触和三级结构预测方面的成功应用。