Schurz Alioune, Su Bo-Han, Tu Yi-Shu, Lu Tony Tsung-Yu, Lin Olivia A, Tseng Yufeng J
Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
Department of Computer Science and Information Engineering, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
J Cheminform. 2017 Sep 15;9(1):50. doi: 10.1186/s13321-017-0238-7.
GPU acceleration is useful in solving complex chemical information problems. Identifying unknown structures from the mass spectra of natural product mixtures has been a desirable yet unresolved issue in metabolomics. However, this elucidation process has been hampered by complex experimental data and the inability of instruments to completely separate different compounds. Fortunately, with current high-resolution mass spectrometry, one feasible strategy is to define this problem as extending a scaffold database with sidechains of different probabilities to match the high-resolution mass obtained from a high-resolution mass spectrum. By introducing a dynamic programming (DP) algorithm, it is possible to solve this NP-complete problem in pseudo-polynomial time. However, the running time of the DP algorithm grows by orders of magnitude as the number of mass decimal digits increases, thus limiting the boost in structural prediction capabilities. By harnessing the heavily parallel architecture of modern GPUs, we designed a "compute unified device architecture" (CUDA)-based GPU-accelerated mixture elucidator (G.A.M.E.) that considerably improves the performance of the DP, allowing up to five decimal digits for input mass data. As exemplified by four testing datasets with verified constitutions from natural products, G.A.M.E. allows for efficient and automatic structural elucidation of unknown mixtures for practical procedures. Graphical abstract .
GPU加速在解决复杂化学信息问题方面很有用。从天然产物混合物的质谱中识别未知结构一直是代谢组学中一个令人期待但尚未解决的问题。然而,这一解析过程受到复杂实验数据以及仪器无法完全分离不同化合物的阻碍。幸运的是,利用当前的高分辨率质谱,一种可行的策略是将这个问题定义为用具有不同概率的侧链扩展一个支架数据库,以匹配从高分辨率质谱中获得的高分辨率质量。通过引入动态规划(DP)算法,可以在伪多项式时间内解决这个NP完全问题。然而,随着质量小数位数的增加,DP算法的运行时间呈数量级增长,从而限制了结构预测能力的提升。通过利用现代GPU的高度并行架构,我们设计了一种基于“计算统一设备架构”(CUDA)的GPU加速混合物解析器(G.A.M.E.),它显著提高了DP的性能,允许输入质量数据有多达五位小数。以四个具有经核实的天然产物组成的测试数据集为例,G.A.M.E.允许对实际程序中的未知混合物进行高效且自动的结构解析。图形摘要。