Human Health Therapeutics Research Centre, National Research Council Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, Canada.
J Chem Inf Model. 2023 Aug 28;63(16):5169-5181. doi: 10.1021/acs.jcim.3c00165. Epub 2023 Aug 7.
The medically relevant field of protein-based therapeutics has triggered a demand for protein engineering in different pH environments of biological relevance. engineering workflows typically employ high-throughput screening campaigns that require evaluating large sets of protein residues and point mutations by fast yet accurate computational algorithms. While several high-throughput p prediction methods exist, their accuracies are unclear due to the lack of a current comprehensive benchmarking. Here, seven fast, efficient, and accessible approaches including PROPKA3, DeepKa, PKAI, PKAI+, DelPhiPKa, MCCE2, and H++ were systematically tested on a nonredundant subset of 408 measured protein residue p shifts from the p database (PKAD). While no method outperformed the null hypotheses with confidence, as illustrated by statistical bootstrapping, DeepKa, PKAI+, PROPKA3, and H++ had utility. More specifically, DeepKa consistently performed well in tests across multiple and individual amino acid residue types, as reflected by lower errors, higher correlations, and improved classifications. Arithmetic averaging of the best empirical predictors into simple consensuses improved overall transferability and accuracy up to a root-mean-square error of 0.76 p units and a correlation coefficient () of 0.45 to experimental p shifts. This analysis should provide a basis for further methodological developments and guide future applications, which require embedding of computationally inexpensive p prediction methods, such as the optimization of antibodies for pH-dependent antigen binding.
蛋白质治疗学这一与医学相关的领域,引发了人们对不同生物学相关 pH 环境下的蛋白质工程学的需求。工程学工作流程通常采用高通量筛选,这需要通过快速而准确的计算算法来评估大量的蛋白质残基和点突变。虽然有几种高通量 p 值预测方法,但由于缺乏当前全面的基准测试,其准确性尚不清楚。在这里,七种快速、高效且易于使用的方法,包括 PROPKA3、DeepKa、PKAI、PKAI+、DelPhiPKa、MCCE2 和 H++,在来自 p 值数据库(PKAD)的 408 个测量蛋白质残基 p 值变化的非冗余子集中进行了系统测试。虽然没有一种方法能够自信地超越无效假设,正如统计自举所说明的那样,但 DeepKa、PKAI+、PROPKA3 和 H++具有一定的作用。更具体地说,DeepKa 在跨多个和单个氨基酸残基类型的测试中表现良好,其错误较低、相关性较高,并且分类得到了改进。最佳经验预测器的算术平均值被整合为简单的共识,提高了整体的可转移性和准确性,将均方根误差提高到 0.76 p 单位,相关系数 ()提高到 0.45,与实验 p 值变化相关。这项分析应该为进一步的方法开发提供基础,并指导未来的应用,这些应用需要嵌入计算成本低廉的 p 值预测方法,例如优化 pH 依赖性抗原结合的抗体。