West Pomeranian University of Technology, Faculty of Computer Science, Zolnierska 49, Szczecin, 71-210, Poland.
BMC Bioinformatics. 2019 Apr 24;20(1):208. doi: 10.1186/s12859-019-2785-6.
An RNA primary structure, or sequence, is a single strand considered as a chain of nucleotides from the alphabet AUGC (adenine, uracil, guanine, cytosine). The strand can be folded onto itself, i.e., one segment of an RNA sequence might be paired with another segment of the same RNA sequence into a two-dimensional structure composed by a list of complementary base pairs, which are close together with the minimum energy. That list is called RNA's secondary structure and is predicted by an RNA folding algorithm. RNA secondary structure prediction is a computing-intensive task that lies at the core of search applications in bioinformatics.
We suggest a space-time tiling approach and apply it to generate parallel cache effective tiled code for RNA folding using Nussinov's algorithm.
Parallel tiled code generated with a suggested space-time loop tiling approach outperforms known related codes generated automatically by means of optimizing compilers and codes produced manually. The presented approach enables us to tile all the three loops of Nussinov's recurrence that is not possible with commonly known tiling techniques. Generated parallel tiled code is scalable regarding to the number of parallel threads - increasing the number of threads reduces code execution time. Defining speed up as the ratio of the time taken to run the original serial program on one thread to the time taken to run the tiled program on P threads, we achieve super-linear speed up (a value of speed up is greater than the number of threads used) for parallel tiled code against the original serial code up to 32 threads and super-linear speed up scalability (increasing speed up with increasing the thread number) up to 8 threads. For one thread used, speed up is about 4.2 achieved on an Intel Xeon machine used for carrying out experiments.
RNA 的一级结构,或序列,是一条被视为核苷酸链的单链,其字母表为 AUGC(腺嘌呤、尿嘧啶、鸟嘌呤、胞嘧啶)。该链可以自身折叠,即 RNA 序列的一段可能与同一 RNA 序列的另一段配对,形成由互补碱基对列表组成的二维结构,这些碱基对彼此靠近,能量最小。该列表称为 RNA 的二级结构,由 RNA 折叠算法预测。RNA 二级结构预测是一项计算密集型任务,是生物信息学中搜索应用程序的核心。
我们提出了一种时空平铺方法,并将其应用于使用 Nussinov 算法生成并行缓存有效的平铺代码,用于 RNA 折叠。
与通过优化编译器自动生成的已知相关代码以及手动生成的代码相比,使用建议的时空循环平铺方法生成的并行平铺代码的性能更好。所提出的方法使我们能够平铺 Nussinov 递归的所有三个循环,这是常见的平铺技术无法实现的。生成的并行平铺代码在可扩展性方面表现出色,即针对并行线程的数量进行扩展——增加线程数量可以减少代码执行时间。将速度定义为在一个线程上运行原始串行程序所花费的时间与在 P 个线程上运行平铺程序所花费的时间之比,我们针对原始串行代码实现了并行平铺代码的超线性加速(速度提升值大于使用的线程数),最高可达 32 个线程,以及超线性加速可扩展性(随着线程数量的增加加速提升),最高可达 8 个线程。在使用一个线程的情况下,在用于进行实验的 Intel Xeon 机器上实现的速度提升约为 4.2。