Department of Chemical and Biological Engineering and Center for Biotechnology and interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York, USA.
Biotechnology Discovery Research, Eli Lilly Biotechnology Center, San Diego, California, USA.
MAbs. 2022 Jan-Dec;14(1):2062807. doi: 10.1080/19420862.2022.2062807.
Although monoclonal antibodies (mAbs) have been shown to be extremely effective in treating a number of diseases, they often suffer from poor developability attributes, such as high viscosity and low solubility at elevated concentrations. Since experimental candidate screening is often materials and labor intensive, there is substantial interest in developing tools for expediting mAb design. Here, we present a strategy using machine learning-based QSAR models for the a priori estimation of mAb solubility. The extrapolated protein solubilities of a set of 111 antibodies in a histidine buffer were determined using a high throughput PEG precipitation assay. 3D homology models of the antibodies were determined, and a large set of in house and commercially available molecular descriptors were then calculated. The resulting experimental and descriptor data were then used for the development of QSAR models of mAb solubilities. After feature selection and training with different machine learning algorithms, the models were evaluated with external test sets. The resulting regression models were able to estimate the solubility values of external test set data with R of 0.81 and 0.85 for the two regression models developed. In addition, three class and binary classification models were developed and shown to be good estimators of mAb solubility behavior, with overall test set accuracies of 0.70 and 0.95, respectively. The analysis of the selected molecular descriptors in these models was also found to be informative and suggested that several charge-based descriptors and isotype may play important roles in mAb solubility. The combination of high throughput relative solubility experimental techniques in concert with efficient machine learning QSAR models offers an opportunity to rapidly screen potential mAb candidates and to design therapeutics with improved solubility characteristics.
虽然单克隆抗体(mAbs)已被证明在治疗多种疾病方面非常有效,但它们往往存在较差的可开发性属性,例如在高浓度下的高粘度和低溶解度。由于实验候选物的筛选通常需要耗费大量的材料和人力,因此人们对开发加速 mAb 设计的工具产生了浓厚的兴趣。在这里,我们提出了一种使用基于机器学习的 QSAR 模型来预先估计 mAb 溶解度的策略。使用高通量 PEG 沉淀测定法测定了一组 111 种抗体在组氨酸缓冲液中的蛋白溶解度。确定了抗体的 3D 同源模型,然后计算了大量内部和商业可用的分子描述符。然后将实验和描述符数据用于 mAb 溶解度的 QSAR 模型的开发。在使用不同的机器学习算法进行特征选择和训练后,使用外部测试集对模型进行了评估。所得到的回归模型能够以 R 为 0.81 和 0.85 来估计外部测试集数据的溶解度值。此外,还开发了三类和二分类模型,并证明它们是 mAb 溶解度行为的良好估计器,总体测试集准确性分别为 0.70 和 0.95。对这些模型中选择的分子描述符的分析也被发现是有意义的,并表明几个基于电荷的描述符和同种型可能在 mAb 溶解度中起重要作用。高通量相对溶解度实验技术与高效机器学习 QSAR 模型的结合为快速筛选潜在的 mAb 候选物和设计具有改善溶解度特性的治疗剂提供了机会。