Zhao Chunchun, Sahni Sartaj
Department of Computer and Information Science and Engineering, University of Florida, Gainesville, 32611, FL, USA.
BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):518. doi: 10.1186/s12859-017-1917-0.
An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses.
Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy.
Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
RNA折叠/RNA二级结构预测算法通过最大化互补碱基对数量并最小化能量来确定非嵌套/无假结结构。已提出了几种Nussinov经典RNA折叠算法的实现方式。我们的重点是通过减少缓存未命中次数来获得运行时间和能量效率。
开发了三种针对Nussinov RNA折叠的缓存高效算法,即逐行算法(ByRow)、逐行分段算法(ByRowSegment)和逐盒算法(ByBox)。使用简单的LRU缓存模型,我们表明Nussinov的经典算法缓存未命中次数最多,其次是转置算法(Li等人提出)、逐行算法、逐行分段算法和逐盒算法(按此顺序)。在四个计算平台(至强E5、AMD速龙64 X2、英特尔I7和PowerPC A2)上使用两种编程语言(C和Java)进行的广泛实验表明,我们的缓存高效算法在运行时间和能量方面也很高效。
我们的基准测试表明,根据计算平台和编程语言的不同,逐行算法或逐盒算法具有最佳的运行时间和能量性能。相对于经典算法,这些算法的C版本运行时间最多可减少97.2%,能耗最多可减少88.8%;相对于转置算法,运行时间最多可减少56.3%,能耗最多可减少57.8%。Java版本相对于经典算法运行时间最多可减少98.3%,相对于转置算法最多可减少75.2%。转置算法以牺牲内存为代价实现运行时间和能量效率,因为它所需的内存是经典算法的两倍。逐行算法、逐行分段算法和逐盒算法所需的内存与经典算法相同。因此,在使用相同内存量的情况下,我们提出的算法能够解决比转置算法可解决的问题大40%的问题。