The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen 361000, China.
Bioinformatics. 2012 Apr 1;28(7):914-20. doi: 10.1093/bioinformatics/bts078. Epub 2012 Feb 10.
Nuclear magnetic resonance (NMR) has been widely used as a powerful tool to determine the 3D structures of proteins in vivo. However, the post-spectra processing stage of NMR structure determination usually involves a tremendous amount of time and expert knowledge, which includes peak picking, chemical shift assignment and structure calculation steps. Detecting accurate peaks from the NMR spectra is a prerequisite for all following steps, and thus remains a key problem in automatic NMR structure determination.
We introduce WaVPeak, a fully automatic peak detection method. WaVPeak first smoothes the given NMR spectrum by wavelets. The peaks are then identified as the local maxima. The false positive peaks are filtered out efficiently by considering the volume of the peaks. WaVPeak has two major advantages over the state-of-the-art peak-picking methods. First, through wavelet-based smoothing, WaVPeak does not eliminate any data point in the spectra. Therefore, WaVPeak is able to detect weak peaks that are embedded in the noise level. NMR spectroscopists need the most help isolating these weak peaks. Second, WaVPeak estimates the volume of the peaks to filter the false positives. This is more reliable than intensity-based filters that are widely used in existing methods. We evaluate the performance of WaVPeak on the benchmark set proposed by PICKY (Alipanahi et al., 2009), one of the most accurate methods in the literature. The dataset comprises 32 2D and 3D spectra from eight different proteins. Experimental results demonstrate that WaVPeak achieves an average of 96%, 91%, 88%, 76% and 85% recall on (15)N-HSQC, HNCO, HNCA, HNCACB and CBCA(CO)NH, respectively. When the same number of peaks are considered, WaVPeak significantly outperforms PICKY.
WaVPeak is an open source program. The source code and two test spectra of WaVPeak are available at http://faculty.kaust.edu.sa/sites/xingao/Pages/Publications.aspx. The online server is under construction.
statliuzhi@xmu.edu.cn; ahmed.abbas@kaust.edu.sa; majing@ust.hk; xin.gao@kaust.edu.sa.
核磁共振(NMR)已广泛应用于确定体内蛋白质的 3D 结构,是一种强有力的工具。然而,NMR 结构测定的谱后处理阶段通常需要大量的时间和专业知识,包括峰选择、化学位移赋值和结构计算步骤。从 NMR 谱中准确检测到峰是所有后续步骤的前提,因此仍然是自动 NMR 结构测定中的一个关键问题。
我们引入了 WaVPeak,这是一种全自动的峰检测方法。WaVPeak 首先通过小波对给定的 NMR 谱进行平滑处理。然后将峰识别为局部最大值。通过考虑峰的体积,有效地过滤掉假阳性峰。与最先进的峰选择方法相比,WaVPeak 具有两个主要优势。首先,通过基于小波的平滑,WaVPeak 不会在谱中消除任何数据点。因此,WaVPeak 能够检测到嵌入噪声水平的弱峰。NMR 光谱学家最需要帮助的是分离这些弱峰。其次,WaVPeak 估计峰的体积以过滤假阳性。这比现有方法中广泛使用的基于强度的滤波器更可靠。我们在 PICKY(Alipanahi 等人,2009)提出的基准集中评估了 WaVPeak 的性能,PICKY 是文献中最准确的方法之一。该数据集包括来自八个不同蛋白质的 32 个 2D 和 3D 谱。实验结果表明,WaVPeak 在(15)N-HSQC、HNCO、HNCA、HNCACB 和 CBCA(CO)NH 上的平均召回率分别为 96%、91%、88%、76%和 85%。当考虑相同数量的峰时,WaVPeak 明显优于 PICKY。
WaVPeak 是一个开源程序。WaVPeak 的源代码和两个测试谱可在 http://faculty.kaust.edu.sa/sites/xingao/Pages/Publications.aspx 获得。在线服务器正在构建中。
statliuzhi@xmu.edu.cn;ahmed.abbas@kaust.edu.sa;majing@ust.hk;xin.gao@kaust.edu.sa。