Biró Bálint, Zhao Bi, Kurgan Lukasz
Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary.
Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States.
Comput Struct Biotechnol J. 2022 May 6;20:2223-2234. doi: 10.1016/j.csbj.2022.05.003. eCollection 2022.
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
基于序列的残基水平蛋白质功能和结构预测因子涵盖了广泛的特征,包括内在无序、二级结构、溶剂可及性以及与核酸的结合。它们在众多调查和评估中被编目和评估。然而,专注于给定特征的方法与其他特征的预测因子是分开研究的,而它们通常用于相同的蛋白质。我们通过使用一个大型的、分类学上一致且相似度低的人类蛋白质数据集,研究针对不同预测的代表性方法集合的互补性,填补了这一空白。首先,我们弥合了开发结合残基的结构训练预测因子与无序训练预测因子的群体之间的差距。受最近一项蛋白质结合残基预测研究的启发,我们通过实验发现,结合DNA结合和RNA结合残基的结构训练和无序训练预测因子可显著提高预测质量。其次,我们研究不同的预测因子是否能产生准确再现实验数据中存在的二级结构、溶剂可及性、相互作用位点和内在无序之间关系的结果。我们的实证分析得出结论,预测准确反映了这些关系的所有组合。总之,这项研究提供了独特的见解,支持将不同的蛋白质功能和结构残基水平预测因子产生的结果结合起来。