Yeh Hsin-Yi Cindy, Lindsey Aaron, Wu Chih-Peng, Thomas Shawna, Amato Nancy M
Parasol Lab, Department of Computer Science & Engineering, Texas A&M University , College Station, Texas.
J Comput Biol. 2015 Sep;22(9):823-36. doi: 10.1089/cmb.2015.0116. Epub 2015 Aug 10.
Predicting protein structures and simulating protein folding are two of the most important problems in computational biology today. Simulation methods rely on a scoring function to distinguish the native structure (the most energetically stable) from non-native structures. Decoy databases are collections of non-native structures used to test and verify these functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and removing redundant structures. We test our approach on 20 different decoy databases of varying size and type and show significant improvement across a variety of metrics. We also test our improved databases on two popular modern scoring functions and show that for most cases they contain a greater or equal number of native-like structures than the original databases, thereby producing a more rigorous database for testing scoring functions.
预测蛋白质结构和模拟蛋白质折叠是当今计算生物学中两个最重要的问题。模拟方法依靠评分函数来区分天然结构(能量上最稳定的结构)和非天然结构。诱饵数据库是用于测试和验证这些函数的非天然结构集合。我们提出了一种通过添加新结构和去除冗余结构来评估和提高诱饵数据库质量的方法。我们在20个不同大小和类型的诱饵数据库上测试了我们的方法,并在各种指标上显示出显著改进。我们还在两种流行的现代评分函数上测试了我们改进后的数据库,结果表明在大多数情况下,它们比原始数据库包含更多或相同数量的类似天然结构,从而为测试评分函数生成了一个更严格的数据库。