Department of Biological Sciences, Louisiana State University Baton Rouge, LA, USA ; Center for Computation and Technology, Louisiana State University Baton Rouge, LA, USA.
Front Genet. 2013 Jun 19;4:118. doi: 10.3389/fgene.2013.00118. eCollection 2013.
Protein threading is widely used in the prediction of protein structure and the subsequent functional annotation. Most threading approaches employ similar criteria for the template identification for use in both protein structure and function modeling. Using structure similarity alone might result in a high false positive rate in protein function inference, which suggests that selecting functional templates should be subject to a different set of constraints. In this study, we extend the functionality of eThread, a recently developed approach to meta-threading, focusing on the optimal selection of functional templates. We optimized the selection of template proteins to cover a broad spectrum of protein molecular function: ligand, metal, inorganic cluster, protein, and nucleic acid binding. In large-scale benchmarks, we demonstrate that the recognition rates in identifying templates that bind molecular partners in similar locations are very high, typically 70-80%, at the expense of a relatively low false positive rate. eThread also provides useful insights into the chemical properties of binding molecules and the structural features of binding. For instance, the sensitivity in recognizing similar protein-binding interfaces is 58% at only 18% false positive rate. Furthermore, in comparative analysis, we demonstrate that meta-threading supported by machine learning outperforms single-threading approaches in functional template selection. We show that meta-threading effectively detects many facets of protein molecular function, even in a low-sequence identity regime. The enhanced version of eThread is freely available as a webserver and stand-alone software at http://www.brylinski.org/ethread.
蛋白质序列分析被广泛应用于蛋白质结构预测和随后的功能注释。大多数序列分析方法都采用相似的模板识别标准,用于蛋白质结构和功能建模。仅使用结构相似性可能会导致蛋白质功能推断中的高假阳性率,这表明选择功能模板应该受到不同的约束条件的限制。在这项研究中,我们扩展了 eThread 的功能,eThread 是一种最近开发的元序列分析方法,重点是对功能模板的最佳选择。我们优化了模板蛋白的选择,以涵盖广泛的蛋白质分子功能:配体、金属、无机簇、蛋白质和核酸结合。在大规模基准测试中,我们证明了在识别与类似位置的分子伴侣结合的模板方面,识别率非常高,通常为 70-80%,而假阳性率相对较低。eThread 还提供了有关结合分子的化学性质和结合的结构特征的有用信息。例如,在仅 18%的假阳性率下,识别相似蛋白结合界面的敏感性为 58%。此外,在比较分析中,我们证明了机器学习支持的元序列分析在功能模板选择方面优于单序列分析方法。我们表明,元序列分析有效地检测了蛋白质分子功能的许多方面,即使在低序列同一性的情况下也是如此。eThread 的增强版本可作为一个网络服务器和独立软件在 http://www.brylinski.org/ethread 上免费获得。