Advanced Networks Research Group, School of Information Technologies (J12), the University of Sydney, NSW 2006, Australia.
BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S21. doi: 10.1186/1471-2164-10-S3-S21.
In this paper, we introduce a novel inter-range interaction integrated approach for protein domain boundary prediction. It involves (1) the design of modular kernel algorithm, which is able to effectively exploit the information of non-local interactions in amino acids, and (2) the development of a novel profile that can provide suitable information to the algorithm. One of the key features of this profiling technique is the use of multiple structural alignments of remote homologues to create an extended sequence profile and combines the structural information with suitable chemical information that plays an important role in protein stability. This profile can capture the sequence characteristics of an entire structural superfamily and extend a range of profiles generated from sequence similarity alone.
Our novel profile that combines homology information with hydrophobicity from SARAH1 scale was successful in providing more structural and chemical information. In addition, the modular approach adopted in our algorithm proved to be effective in capturing information from non-local interactions. Our approach achieved 82.1%, 50.9% and 31.5% accuracies for one-domain, two-domain, and three- and more domain proteins respectively.
The experimental results in this study are encouraging, however, more work is need to extend it to a broader range of applications. We are currently developing a novel interactive (human in the loop) profiling that can provide information from more distantly related homology. This approach will further enhance the current study.
在本文中,我们介绍了一种新颖的基于跨区间相互作用的蛋白质结构域边界预测方法。该方法包括:(1)模块化核算法的设计,该算法能够有效地利用氨基酸的非局部相互作用信息;(2)新型轮廓的开发,该轮廓可以为算法提供合适的信息。该轮廓技术的一个关键特征是使用多个远程同源物的结构比对来创建扩展的序列轮廓,并将结构信息与合适的化学信息相结合,这些化学信息在蛋白质稳定性中起着重要作用。该轮廓可以捕获整个结构超家族的序列特征,并扩展仅基于序列相似性生成的一系列轮廓。
我们的新型轮廓将同源信息与 SARAH1 标度的疏水性相结合,成功地提供了更多的结构和化学信息。此外,我们算法中采用的模块化方法在捕获非局部相互作用信息方面也非常有效。我们的方法在单结构域、双结构域以及三结构域和更多结构域蛋白的预测中分别实现了 82.1%、50.9%和 31.5%的准确率。
本研究的实验结果令人鼓舞,但需要进一步的工作将其扩展到更广泛的应用领域。我们目前正在开发一种新的交互式(人机交互)轮廓技术,可以从更远缘的同源物中获取信息。这种方法将进一步增强当前的研究。