Chaichoompu Kridsadakorn, Kittitornkun Surin, Tongsima Sissades
Genome Institute, National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Paholyothin Road, Klong 1, Klong Luang, Pathumtani 12120, Thailand.
Bioinformation. 2007 Dec 30;2(5):182-4. doi: 10.6026/97320630002182.
Many computational intensive bioinformatics software, such as multiple sequence alignment, population structure analysis, etc., written in C/C++ are not multicore-aware. A multicore processor is an emerging CPU technology that combines two or more independent processors into a single package. The Single Instruction Multiple Data-stream (SIMD) paradigm is heavily utilized in this class of processors. Nevertheless, most popular compilers including Microsoft Visual C/C++ 6.0, x86 gnu C-compiler gcc do not automatically create SIMD code which can fully utilize the advancement of these processors. To harness the power of the new multicore architecture certain compiler techniques must be considered. This paper presents a generic compiling strategy to assist the compiler in improving the performance of bioinformatics applications written in C/C++. The proposed framework contains 2 main steps: multithreading and vectorizing strategies. After following the strategies, the application can achieve higher speedup by taking the advantage of multicore architecture technology. Due to the extremely fast interconnection networking among multiple cores, it is suggested that the proposed optimization could be more appropriate than making use of parallelization on a small cluster computer which has larger network latency and lower bandwidth.
许多用C/C++编写的计算密集型生物信息学软件,如多序列比对、群体结构分析等,都不支持多核。多核处理器是一种新兴的CPU技术,它将两个或更多独立处理器集成在一个封装中。单指令多数据流(SIMD)范式在这类处理器中得到了大量应用。然而,包括微软Visual C/C++ 6.0、x86 gnu C编译器gcc在内的大多数流行编译器都不会自动生成能充分利用这些处理器优势的SIMD代码。为了利用新的多核架构的强大功能,必须考虑某些编译技术。本文提出了一种通用的编译策略,以帮助编译器提高用C/C++编写的生物信息学应用程序的性能。所提出的框架包含两个主要步骤:多线程和向量化策略。遵循这些策略后,应用程序可以通过利用多核架构技术实现更高的加速比。由于多个核心之间的互连网络极其快速,建议所提出的优化可能比在具有较大网络延迟和较低带宽的小型集群计算机上使用并行化更合适。