GigaGen Inc. (A Grifols Company), South San Francisco, CA, USA.
MAbs. 2022 Jan-Dec;14(1):2069075. doi: 10.1080/19420862.2022.2069075.
The antibody drug field has continually sought improvements to methods for candidate discovery and engineering. Historically, most such methods have been laboratory-based, but informatics methods have recently started to make an impact. Deep learning, a subfield of machine learning, is rapidly gaining prominence in the biomedical research. Recent advances in microfluidics technologies and next-generation sequencing have not only revolutionized therapeutic antibody discovery, but also contributed to a vast amount of antibody repertoire sequencing data, providing opportunities for deep learning-based applications. Previously, we used microfluidics, yeast display, and deep sequencing to generate a panel of binder and non-binder antibody sequences to the cancer immunotherapy targets PD-1 and CTLA-4. Here we encoded the antibody light and heavy chain complementarity-determining regions (CDR3s) into antibody images, then built and trained convolutional neural network models to classify binders and non-binders. To improve model interpretability, we performed mutagenesis to identify CDR3 residues that were important for binder classification. We further built generative deep learning models using generative adversarial network models to produce synthetic antibodies against PD-1 and CTLA-4. Our models generated variable length CDR3 sequences that resemble real sequences. Overall, our study demonstrates that deep learning methods can be leveraged to mine and learn patterns in antibody sequences, offering insights into antibody engineering, optimization, and discovery.
抗体药物领域一直在不断寻求改进候选物发现和工程的方法。历史上,大多数此类方法都是基于实验室的,但信息学方法最近开始产生影响。深度学习是机器学习的一个分支,在生物医学研究中迅速崭露头角。微流控技术和下一代测序技术的最新进展不仅彻底改变了治疗性抗体的发现,而且还促成了大量抗体库序列数据的产生,为基于深度学习的应用提供了机会。此前,我们使用微流控、酵母展示和深度测序生成了一组针对癌症免疫治疗靶点 PD-1 和 CTLA-4 的结合子和非结合子抗体序列。在这里,我们将抗体轻链和重链互补决定区(CDR3)编码成抗体图像,然后构建和训练卷积神经网络模型来对结合子和非结合子进行分类。为了提高模型的可解释性,我们进行了突变以鉴定对结合子分类重要的 CDR3 残基。我们进一步使用生成对抗网络模型构建了生成式深度学习模型,以产生针对 PD-1 和 CTLA-4 的合成抗体。我们的模型生成了与真实序列相似的可变长度 CDR3 序列。总的来说,我们的研究表明,深度学习方法可以用于挖掘和学习抗体序列中的模式,为抗体工程、优化和发现提供了新的思路。