利用现代图形处理单元在蛋白质组学数据集内进行高速特征检测。

Highly accelerated feature detection in proteomics data sets using modern graphics processing units.

机构信息

Computer Science Department, Center for Bioinformatics, Saarland University, 66041 Saarbrücken, Germany.

出版信息

Bioinformatics. 2009 Aug 1;25(15):1937-43. doi: 10.1093/bioinformatics/btp294. Epub 2009 May 14.

DOI:10.1093/bioinformatics/btp294

PMID:19447788

Abstract

MOTIVATION

Mass spectrometry (MS) is one of the most important techniques for high-throughput analysis in proteomics research. Due to the large number of different proteins and their post-translationally modified variants, the amount of data generated by a single wet-lab MS experiment can easily exceed several gigabytes. Hence, the time necessary to analyze and interpret the measured data is often significantly larger than the time spent on sample preparation and the wet-lab experiment itself. Since the automated analysis of this data is hampered by noise and baseline artifacts, more sophisticated computational techniques are required to handle the recorded mass spectra. Obviously, there is a clear tradeoff between performance and quality of the analysis, which is currently one of the most challenging problems in computational proteomics.

RESULTS

Using modern graphics processing units (GPUs), we implemented a feature finding algorithm based on a hand-tailored adaptive wavelet transform that drastically reduces the computation time. A further speedup can be achieved exploiting the multi-core architecture of current computing devices, which leads to up to an approximately 200-fold speed-up in our computational experiments. In addition, we will demonstrate that several approximations necessary on the CPU to keep run times bearable, become obsolete on the GPU, yielding not only faster, but also improved results.

AVAILABILITY

An open source implementation of the CUDA-based algorithm is available via the software framework OpenMS (http://www.openms.de).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

质谱（MS）是蛋白质组学研究中高通量分析的最重要技术之一。由于存在大量不同的蛋白质及其翻译后修饰变体，单个湿实验室 MS 实验生成的数据量很容易超过几个千兆字节。因此，分析和解释测量数据所需的时间通常比样品制备和湿实验室实验本身花费的时间长得多。由于这种数据的自动分析受到噪声和基线伪影的阻碍，因此需要更复杂的计算技术来处理记录的质谱。显然，分析的性能和质量之间存在明显的权衡，这是计算蛋白质组学目前最具挑战性的问题之一。

结果

我们使用现代图形处理单元（GPU）实现了一种基于定制自适应小波变换的特征发现算法，该算法大大减少了计算时间。利用当前计算设备的多核架构可以进一步提高速度，从而在我们的计算实验中实现了大约 200 倍的加速。此外，我们将证明，为了使 CPU 上的运行时间可以承受，必须进行一些近似处理，但在 GPU 上这些近似处理已经过时，不仅可以得到更快的结果，还可以得到改进的结果。