Arsiwala Ammar, Bhatt Rebecca, van Niekerk Lood, Quintero-Cadena Porfirio, Ao Xiang, Rosenbaum Adam, Bhatt Aanal, Smith Alexander, Yang Yaoyu, Anderson K C, Grippo Lucia, Cao Xing, Cohen Rich, Patel Jay, Moller Joshua, Allen Olga, Faraj Ali, Nandy Anisha, Hocking Jason, Ergun Ayla, Tural Berk, Salvador Sara, Jacobowitz Joe, Schaven Kristin, Sherman Mark, Shah Sanjiv, Tessier Peter M, Borhani David W
Ginkgo Bioworks, Inc., Boston, MA, USA.
Cypher Technologies Inc., Cambridge, MA, USA.
MAbs. 2025 Dec;17(1):2593055. doi: 10.1080/19420862.2025.2593055. Epub 2025 Dec 2.
Antibodies must bind their targets with high affinity and specificity to achieve useful therapeutic activity. They must also possess suitable developability properties (e.g. thermostability, solubility, viscosity, polyreactivity) to ensure favorable manufacturing, formulation, and performance. Both binding and developability properties are inherent to a given antibody amino acid sequence. Identification or selection of antibodies possessing suitable-binding characteristics is now routine, and computational design models, trained on extensive complementarity-determining region sequence and structural data, are rapidly improving. Developability properties, however, remain difficult to predict largely due to insufficient training data, with empirical testing being heavily used to avoid challenges in late-stage antibody development. To fill this gap, we built a high-throughput antibody developability assay platform designed to generate the large datasets needed to train improved machine learning (ML) models. We optimized and automated known developability assays, and developed a robust integrated data analytics pipeline. Here, we report data on 246 antibodies - representing 106 approved, 135 clinical-stage, and 5 preregistration/withdrawn molecules - across a panel of 10 developability assays, in a "tidy data" format suitable for AI/ML modeling. We used these data to develop an XGBoost ML model that better predicts similarity to approved antibodies compared to conventional use of developability warning thresholds. Additionally, we confirm that preliminary predictive models do improve with more training data. Our high-throughput PROPHET-Ab platform enables data generation at the scale needed to develop improved ML models to predict antibody developability.
抗体必须以高亲和力和特异性结合其靶标,以实现有效的治疗活性。它们还必须具备合适的可开发性属性(例如热稳定性、溶解性、粘度、多反应性),以确保良好的制造、制剂和性能。结合和可开发性属性都由给定的抗体氨基酸序列所固有。识别或选择具有合适结合特性的抗体现在已成为常规操作,并且基于广泛的互补决定区序列和结构数据训练的计算设计模型正在迅速改进。然而,由于训练数据不足,可开发性属性仍然难以预测,在抗体后期开发中大量使用经验测试来避免挑战。为了填补这一空白,我们构建了一个高通量抗体可开发性检测平台,旨在生成训练改进的机器学习(ML)模型所需的大型数据集。我们优化并自动化了已知的可开发性检测方法,并开发了一个强大的集成数据分析管道。在这里,我们以适合人工智能/机器学习建模的“整洁数据”格式报告了246种抗体的数据——代表106种已批准、135种临床阶段和5种预注册/撤回的分子——涵盖10种可开发性检测方法。我们使用这些数据开发了一个XGBoost机器学习模型,与传统使用可开发性警告阈值相比,该模型能更好地预测与已批准抗体的相似性。此外,我们证实,更多的训练数据确实能改进初步的预测模型。我们的高通量PROPHET-Ab平台能够生成开发改进的机器学习模型以预测抗体可开发性所需规模的数据。