Signal Processing Laboratory , Griffith University , Brisbane , Queensland 4122 , Australia.
Institute for Glycomics and School of Information and Communication Technology , Griffith University , Southport , Queensland 4222 , Australia.
J Chem Inf Model. 2018 Nov 26;58(11):2369-2376. doi: 10.1021/acs.jcim.8b00636. Epub 2018 Nov 13.
Recognizing the widespread existence of intrinsically disordered regions in proteins spurred the development of computational techniques for their detection. All existing techniques can be classified into methods relying on single-sequence information and those relying on evolutionary sequence profiles generated from multiple-sequence alignments. The methods based on sequence profiles are, in general, more accurate because the presence or absence of conserved amino acid residues in a protein sequence provides important information on the structural and functional roles of the residues. However, the wide applicability of profile-based techniques is limited by time-consuming calculation of sequence profiles. Here we demonstrate that the performance gap between profile-based techniques and single-sequence methods can be reduced by using an ensemble of deep recurrent and convolutional neural networks that allow whole-sequence learning. In particular, the single-sequence method (called SPOT-Disorder-Single) is more accurate than SPOT-Disorder (a profile-based method) for proteins with few homologous sequences and comparable for proteins in predicting long-disordered regions. The method performance is robust across four independent test sets with different amounts of short- and long-disordered regions. SPOT-Disorder-Single is available as a Web server and as a standalone program at http://sparks-lab.org/jack/server/SPOT-Disorder-Single .
识别蛋白质中普遍存在的无序区域,激发了用于检测它们的计算技术的发展。所有现有的技术都可以分为依赖于单序列信息的方法和依赖于从多序列比对生成的进化序列轮廓的方法。基于序列轮廓的方法通常更准确,因为蛋白质序列中保守氨基酸残基的存在或缺失为残基的结构和功能作用提供了重要信息。然而,基于轮廓的技术的广泛适用性受到序列轮廓计算耗时的限制。在这里,我们证明通过使用允许整个序列学习的深度递归和卷积神经网络的集合,可以缩小基于轮廓的技术和单序列方法之间的性能差距。特别是,对于同源序列较少的蛋白质,单序列方法(称为 SPOT-Disorder-Single)比基于轮廓的方法(SPOT-Disorder)更准确,并且在预测长无序区域方面与蛋白质相当。该方法在四个具有不同数量短和长无序区域的独立测试集中具有稳健的性能。SPOT-Disorder-Single 可作为 Web 服务器和独立程序在 http://sparks-lab.org/jack/server/SPOT-Disorder-Single 上使用。