Bahrami Arash, Assadi Amir H, Markley John L, Eghbalnia Hamid R
Biochemistry Department, National Magnetic Resonance Facility at Madison, University of Wisconsin Madison, Madison, Wisconsin, United States of America.
PLoS Comput Biol. 2009 Mar;5(3):e1000307. doi: 10.1371/journal.pcbi.1000307. Epub 2009 Mar 13.
The process of assigning a finite set of tags or labels to a collection of observations, subject to side conditions, is notable for its computational complexity. This labeling paradigm is of theoretical and practical relevance to a wide range of biological applications, including the analysis of data from DNA microarrays, metabolomics experiments, and biomolecular nuclear magnetic resonance (NMR) spectroscopy. We present a novel algorithm, called Probabilistic Interaction Network of Evidence (PINE), that achieves robust, unsupervised probabilistic labeling of data. The computational core of PINE uses estimates of evidence derived from empirical distributions of previously observed data, along with consistency measures, to drive a fictitious system M with Hamiltonian H to a quasi-stationary state that produces probabilistic label assignments for relevant subsets of the data. We demonstrate the successful application of PINE to a key task in protein NMR spectroscopy: that of converting peak lists extracted from various NMR experiments into assignments associated with probabilities for their correctness. This application, called PINE-NMR, is available from a freely accessible computer server (http://pine.nmrfam.wisc.edu). The PINE-NMR server accepts as input the sequence of the protein plus user-specified combinations of data corresponding to an extensive list of NMR experiments; it provides as output a probabilistic assignment of NMR signals (chemical shifts) to sequence-specific backbone and aliphatic side chain atoms plus a probabilistic determination of the protein secondary structure. PINE-NMR can accommodate prior information about assignments or stable isotope labeling schemes. As part of the analysis, PINE-NMR identifies, verifies, and rectifies problems related to chemical shift referencing or erroneous input data. PINE-NMR achieves robust and consistent results that have been shown to be effective in subsequent steps of NMR structure determination.
在满足附带条件的情况下,为一组观测值分配一组有限标签或标记的过程,因其计算复杂性而备受关注。这种标记范式在广泛的生物学应用中具有理论和实际意义,包括对来自DNA微阵列、代谢组学实验和生物分子核磁共振(NMR)光谱的数据进行分析。我们提出了一种名为证据概率相互作用网络(PINE)的新算法,该算法可实现对数据进行稳健的无监督概率标记。PINE的计算核心利用从先前观测数据的经验分布中得出的证据估计值,以及一致性度量,来驱动具有哈密顿量H的虚拟系统M达到准稳态,从而为数据的相关子集生成概率标签分配。我们展示了PINE在蛋白质NMR光谱学的一项关键任务中的成功应用:即将从各种NMR实验中提取的峰列表转换为与其正确性概率相关的归属。这个应用程序称为PINE-NMR,可从一个免费访问的计算机服务器(http://pine.nmrfam.wisc.edu)获得。PINE-NMR服务器接受蛋白质序列以及与大量NMR实验相对应的用户指定数据组合作为输入;它提供的输出是将NMR信号(化学位移)概率性地分配给序列特异性主链和脂肪族侧链原子,以及对蛋白质二级结构的概率性确定。PINE-NMR可以容纳有关归属或稳定同位素标记方案的先验信息。作为分析的一部分,PINE-NMR识别、验证并纠正与化学位移参考或错误输入数据相关的问题。PINE-NMR取得了稳健且一致的结果,这些结果已被证明在NMR结构测定的后续步骤中是有效的。