Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Cell Rep Methods. 2022 Jul 11;2(7):100254. doi: 10.1016/j.crmeth.2022.100254. eCollection 2022 Jul 18.
Effective biologics require high specificity and limited off-target binding, but these properties are not guaranteed by current affinity-selection-based discovery methods. Molecular counterselection against off targets is a technique for identifying nonspecific sequences but is experimentally costly and can fail to eliminate a large fraction of nonspecific sequences. Here, we introduce computational counterselection, a framework for removing nonspecific sequences from pools of candidate biologics using machine learning models. We demonstrate the method using sequencing data from single-target affinity selection of antibodies, bypassing combinatorial experiments. We show that computational counterselection outperforms molecular counterselection by performing cross-target selection and individual binding assays to determine the performance of each method at retaining on-target, specific antibodies and identifying and eliminating off-target, nonspecific antibodies. Further, we show that one can identify generally polyspecific antibody sequences using a general model trained on affinity data from unrelated targets with potential affinity for a broad range of sequences.
有效的生物制剂需要高度的特异性和有限的脱靶结合,但这些特性不能保证当前基于亲和力选择的发现方法。针对脱靶的分子反向选择是一种识别非特异性序列的技术,但实验成本高,并且可能无法消除很大一部分非特异性序列。在这里,我们引入了计算反向选择,这是一种使用机器学习模型从候选生物制剂库中去除非特异性序列的框架。我们使用来自抗体单靶标亲和力选择的测序数据演示了该方法,绕过了组合实验。我们表明,通过进行跨靶标选择和单独的结合测定,计算反向选择优于分子反向选择,以确定每种方法在保留针对特定靶标的特异性抗体以及识别和消除针对非特定靶标的非特异性抗体方面的性能。此外,我们表明,人们可以使用基于与具有广泛序列潜在亲和力的无关靶标亲和力数据训练的通用模型来识别一般多特异性抗体序列。