Suppr超能文献

通过机器学习预测抗体 - 抗原结合:数据集的开发与方法评估

Prediction of Antibody-Antigen Binding via Machine Learning: Development of Data Sets and Evaluation of Methods.

作者信息

Ye Chao, Hu Wenxing, Gaeta Bruno

机构信息

School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.

Department of Computer Science, School of Information Science and Technology, Tokyo Institute of Technology, Tokyo, Japan.

出版信息

JMIR Bioinform Biotechnol. 2022 Oct 28;3(1):e29404. doi: 10.2196/29404.

Abstract

BACKGROUND

The mammalian immune system is able to generate antibodies against a huge variety of antigens, including bacteria, viruses, and toxins. The ultradeep DNA sequencing of rearranged immunoglobulin genes has considerable potential in furthering our understanding of the immune response, but it is limited by the lack of a high-throughput, sequence-based method for predicting the antigen(s) that a given immunoglobulin recognizes.

OBJECTIVE

As a step toward the prediction of antibody-antigen binding from sequence data alone, we aimed to compare a range of machine learning approaches that were applied to a collated data set of antibody-antigen pairs in order to predict antibody-antigen binding from sequence data.

METHODS

Data for training and testing were extracted from the Protein Data Bank and the Coronavirus Antibody Database, and additional antibody-antigen pair data were generated by using a molecular docking protocol. Several machine learning methods, including the weighted nearest neighbor method, the nearest neighbor method with the BLOSUM62 matrix, and the random forest method, were applied to the problem.

RESULTS

The final data set contained 1157 antibodies and 57 antigens that were combined in 5041 antibody-antigen pairs. The best performance for the prediction of interactions was obtained by using the nearest neighbor method with the BLOSUM62 matrix, which resulted in around 82% accuracy on the full data set. These results provide a useful frame of reference, as well as protocols and considerations, for machine learning and data set creation in the prediction of antibody-antigen binding.

CONCLUSIONS

Several machine learning approaches were compared to predict antibody-antigen interaction from protein sequences. Both the data set (in CSV format) and the machine learning program (coded in Python) are freely available for download on GitHub.

摘要

背景

哺乳动物免疫系统能够产生针对多种抗原的抗体,包括细菌、病毒和毒素。重排免疫球蛋白基因的超深度DNA测序在加深我们对免疫反应的理解方面具有巨大潜力,但它受到缺乏一种高通量、基于序列的方法来预测给定免疫球蛋白识别的抗原的限制。

目的

作为仅从序列数据预测抗体 - 抗原结合的第一步,我们旨在比较一系列应用于整理的抗体 - 抗原对数据集的机器学习方法,以便从序列数据预测抗体 - 抗原结合。

方法

训练和测试数据从蛋白质数据库和冠状病毒抗体数据库中提取,并使用分子对接协议生成额外的抗体 - 抗原对数据。几种机器学习方法,包括加权最近邻法、带有BLOSUM62矩阵的最近邻法和随机森林法,被应用于该问题。

结果

最终数据集包含1157种抗体和57种抗原,它们组合成5041个抗体 - 抗原对。使用带有BLOSUM62矩阵的最近邻法在预测相互作用方面取得了最佳性能,在完整数据集上的准确率约为82%。这些结果为抗体 - 抗原结合预测中的机器学习和数据集创建提供了有用的参考框架以及方案和注意事项。

结论

比较了几种机器学习方法以从蛋白质序列预测抗体 - 抗原相互作用。数据集(CSV格式)和机器学习程序(用Python编码)均可在GitHub上免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7286/11135222/00a9b045aaeb/bioinform_v3i1e29404_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验