Biswas Abhishek, Si Dong, Al Nasr Kamal, Ranjan Desh, Zubair Mohammad, He Jing
Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
J Bioinform Comput Biol. 2012 Jun;10(3):1242006. doi: 10.1142/S0219720012420061.
The determination of the secondary structure topology is a critical step in deriving the atomic structure from the protein density map obtained from electron cryo-microscopy technique. This step often relies on the matching of two sources of information. One source comes from the secondary structures detected from the protein density map at the medium resolution, such as 5-10 Å. The other source comes from the predicted secondary structures from the amino acid sequence. Due to the inaccuracy in either source of information, a pool of possible secondary structure positions needs to be sampled. This paper studies the question, that is, how to reduce the computation of the mapping when the inaccuracy of the secondary structure predictions is considered. We present a method that combines the concept of dynamic graph with our previous work of using constrained shortest path to identify the topology of the secondary structures. We show a reduction of 34.55% of run-time as comparison to the naïve way of handling the inaccuracies. We also show an improved accuracy when the potential secondary structure errors are explicitly sampled verses the use of one consensus prediction. Our framework demonstrated the potential of developing computationally effective exact algorithms to identify the optimal topology of the secondary structures when the inaccuracy of the predicted data is considered.
从电子冷冻显微镜技术获得的蛋白质密度图推导原子结构时,二级结构拓扑的确定是关键步骤。这一步通常依赖于两种信息来源的匹配。一种来源是在中等分辨率(如5 - 10埃)下从蛋白质密度图中检测到的二级结构。另一种来源是从氨基酸序列预测的二级结构。由于任何一种信息来源都存在不准确之处,因此需要对一系列可能的二级结构位置进行采样。本文研究了这样一个问题,即当考虑二级结构预测的不准确时,如何减少映射的计算量。我们提出了一种方法,该方法将动态图的概念与我们之前使用约束最短路径识别二级结构拓扑的工作相结合。与处理不准确情况的朴素方法相比,我们展示了运行时间减少了34.55%。当明确采样潜在的二级结构误差与使用一个共识预测相比时,我们还展示了更高的准确性。我们的框架展示了在考虑预测数据不准确的情况下,开发计算高效的精确算法来识别二级结构最佳拓扑的潜力。