Department of Computer Science, University of Bologna, Bologna, Italy.
BioData Min. 2011 Jan 13;4(1):1. doi: 10.1186/1756-0381-4-1.
The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact map representations require fast and reliable methods to reconstruct the specific folding of the protein backbone.
In this paper, by adopting a GRID technology, our algorithm for 3D reconstruction FT-COMAR is benchmarked on a huge set of non redundant proteins (1716) taking random noise into consideration and this makes our computation the largest ever performed for the task at hand.
We can observe the effects of introducing random noise on 3D reconstruction and derive some considerations useful for future implementations. The dimension of the protein set allows also statistical considerations after grouping per SCOP structural classes.
All together our data indicate that the quality of 3D reconstruction is unaffected by deleting up to an average 75% of the real contacts while only few percentage of randomly generated contacts in place of non-contacts are sufficient to hamper 3D reconstruction.
目前在原子水平上对蛋白质结构的了解来源于约 60000 个分子。然而,指数级增长的假设性蛋白质序列集包含约 1000 万个链,这使得蛋白质结构预测成为生物信息学的挑战性目标之一。在这种情况下,使用接触图表示蛋白质是折叠识别的中间步骤,并且构成接触图预测器的输入。然而,接触图表示需要快速且可靠的方法来重建蛋白质骨架的特定折叠。
在本文中,通过采用 GRID 技术,我们的 3D 重建 FT-COMAR 算法在考虑随机噪声的情况下,对一组庞大的非冗余蛋白质(1716 个)进行了基准测试,这使得我们的计算成为迄今为止为此任务执行的最大计算。
我们可以观察到引入随机噪声对 3D 重建的影响,并得出一些对未来实现有用的考虑因素。蛋白质集的维度允许在按 SCOP 结构类别分组后进行统计考虑。
总的来说,我们的数据表明,在删除高达平均 75%的真实接触的情况下,3D 重建的质量不受影响,而仅用少量百分比的随机生成的接触代替非接触足以阻碍 3D 重建。