Sweredoski Michael J, Donovan Kevin J, Nguyen Bao D, Shaka A J, Baldi Pierre
Department of Computer Science, Institute for Genomics and Bioinformatics, University of California, Irvine, USA.
Bioinformatics. 2007 Nov 1;23(21):2829-35. doi: 10.1093/bioinformatics/btm406. Epub 2007 Sep 25.
Recent advances in cell-free protein expression systems allow specific labeling of proteins with amino acids containing stable isotopes ((15)N, (13) C and (2)H), an important feature for protein structure determination by nuclear magnetic resonance (NMR) spectroscopy. Given this labeling ability, we present a mathematical optimization framework for designing a set of protein isotopomers, or labeling schedules, to reduce the congestion in the NMR spectra. The labeling schedules, which are derived by the optimization of a cost function, are tailored to a specific protein and NMR experiment.
For 2D (15)N-(1)H HSQC experiments, we can produce an exact solution using a dynamic programming algorithm in under 2 h on a standard desktop machine. Applying the method to a standard benchmark protein, calmodulin, we are able to reduce the number of overlaps in the 500 MHz HSQC spectrum from 10 to 1 using four samples with a true cost function, and 10 to 4 if the cost function is derived from statistical estimates. On a set of 448 curated proteins from the BMRB database, we are able to reduce the relative percent congestion by 84.9% in their HSQC spectra using only four samples. Our method can be applied in a high-throughput manner on a proteomic scale using the server we developed. On a 100-node cluster, optimal schedules can be computed for every protein coded for in the human genome in less than a month.
A server for creating labeling schedules for (15)N-(1)H HSQC experiments as well as results for each of the individual 448 proteins used in the test set is available at http://nmr.proteomics.ics.uci.edu.
无细胞蛋白质表达系统的最新进展使得能够用含有稳定同位素((^{15}N)、(^{13}C)和(^{2}H))的氨基酸对蛋白质进行特异性标记,这是通过核磁共振(NMR)光谱确定蛋白质结构的一个重要特征。鉴于这种标记能力,我们提出了一个数学优化框架,用于设计一组蛋白质同位素异构体或标记方案,以减少NMR光谱中的拥挤现象。通过对成本函数进行优化得出的标记方案是针对特定蛋白质和NMR实验量身定制的。
对于二维(^{15}N - ^{1}H) HSQC实验,我们可以在标准台式机上使用动态规划算法在2小时内得出精确解。将该方法应用于标准基准蛋白质钙调蛋白,使用四个具有真实成本函数的样本,我们能够将500 MHz HSQC光谱中的重叠峰数量从10个减少到1个;如果成本函数是基于统计估计得出的,则可从10个减少到4个。在来自BMRB数据库的一组448种精选蛋白质上,仅使用四个样本,我们就能将其HSQC光谱中的相对拥挤百分比降低84.9%。我们的方法可以使用我们开发的服务器以高通量方式应用于蛋白质组规模。在一个100节点的集群上,不到一个月就能为人类基因组中编码的每种蛋白质计算出最佳方案。
可通过http://nmr.proteomics.ics.uci.edu获取用于创建\(^{15}N - ^{1}H) HSQC实验标记方案的服务器以及测试集中使用的448种单个蛋白质各自的结果。