Martin Joshua S, Simmons Katrina, Laederach Alain
Computational and Structural Biology Department, Wadsworth Center, Albany, NY 12208, USA.
Algorithms. 2009 Mar 1;2(1):200-214. doi: 10.3390/a2010200.
Unlike protein folding, the process by which a large RNA molecule adopts a functionally active conformation remains poorly understood. Chemical mapping techniques, such as Hydroxyl Radical (·OH) footprinting report on local structural changes in an RNA as it folds with single nucleotide resolution. The analysis and interpretation of this kinetic data requires the identification and subsequent optimization of a kinetic model and its parameters. We detail our approach to this problem, specifically focusing on a novel strategy to overcome a factorial explosion in the number of possible models that need to be tested to identify the best fitting model. Previously, smaller systems (less than three intermediates) were computationally tractable using a distributed computing approach. However, for larger systems with three or more intermediates, the problem became computationally intractable. With our new enumeration strategy, we are able to significantly reduce the number of models that need to be tested using non-linear least squares optimization, allowing us to study systems with up to five intermediates. Furthermore, two intermediate systems can now be analyzed on a desktop computer, which eliminates the need for a distributed computing solution for most medium-sized data sets. Our new approach also allows us to study potential degeneracy in kinetic model selection, elucidating the limits of the method when working with large systems. This work establishes clear criteria for determining if experimental ·OH data is sufficient to determine the underlying kinetic model, or if other experimental modalities are required to resolve any degeneracy.
与蛋白质折叠不同,对于大型RNA分子形成功能活性构象的过程,我们仍然知之甚少。化学图谱技术,如羟基自由基(·OH)足迹法,能够以单核苷酸分辨率报告RNA折叠过程中的局部结构变化。对这些动力学数据的分析和解释需要识别并随后优化动力学模型及其参数。我们详细阐述了解决这个问题的方法,特别关注一种新颖的策略,以克服在识别最佳拟合模型时需要测试的可能模型数量呈阶乘式增长的问题。以前,使用分布式计算方法,较小的系统(少于三个中间体)在计算上是可行的。然而,对于具有三个或更多中间体的较大系统,这个问题在计算上变得难以处理。通过我们新的枚举策略,我们能够使用非线性最小二乘法优化显著减少需要测试的模型数量,从而使我们能够研究多达五个中间体的系统。此外,现在可以在台式计算机上分析两个中间体的系统,这就消除了对大多数中等规模数据集使用分布式计算解决方案的需求。我们的新方法还使我们能够研究动力学模型选择中的潜在简并性,阐明在处理大型系统时该方法的局限性。这项工作为确定实验性·OH数据是否足以确定潜在的动力学模型,或者是否需要其他实验方式来解决任何简并性问题确立了明确的标准。