Suppr超能文献

使用机器学习方法预测人类适应病毒的可变长度表位的框架。

A framework for predicting variable-length epitopes of human-adapted viruses using machine learning methods.

机构信息

Department of Biomedical Informatics, Harvard Medical School, Boston, USA.

Department of Statistics, University of Oxford, Oxford, UK.

出版信息

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac281.

Abstract

The coronavirus disease 2019 pandemic has alerted people of the threat caused by viruses. Vaccine is the most effective way to prevent the disease from spreading. The interaction between antibodies and antigens will clear the infectious organisms from the host. Identifying B-cell epitopes is critical in vaccine design, development of disease diagnostics and antibody production. However, traditional experimental methods to determine epitopes are time-consuming and expensive, and the predictive performance using the existing in silico methods is not satisfactory. This paper develops a general framework to predict variable-length linear B-cell epitopes specific for human-adapted viruses with machine learning approaches based on Protvec representation of peptides and physicochemical properties of amino acids. QR decomposition is incorporated during the embedding process that enables our models to handle variable-length sequences. Experimental results on large immune epitope datasets validate that our proposed model's performance is superior to the state-of-the-art methods in terms of AUROC (0.827) and AUPR (0.831) on the testing set. Moreover, sequence analysis also provides the results of the viral category for the corresponding predicted epitopes with high precision. Therefore, this framework is shown to reliably identify linear B-cell epitopes of human-adapted viruses given protein sequences and could provide assistance for potential future pandemics and epidemics.

摘要

新型冠状病毒肺炎疫情提醒人们注意病毒带来的威胁。疫苗是预防疾病传播最有效的方法。抗体与抗原的相互作用将清除宿主中的感染性生物。鉴定 B 细胞表位在疫苗设计、疾病诊断和抗体生产中至关重要。然而,传统的实验方法确定表位既耗时又昂贵,并且现有基于计算的方法的预测性能并不令人满意。本文提出了一种基于多肽 Protvec 表示和氨基酸理化性质的机器学习方法,用于预测针对人类适应病毒的可变长度线性 B 细胞表位的通用框架。在嵌入过程中引入 QR 分解,使我们的模型能够处理可变长度序列。在大型免疫表位数据集上的实验结果验证了,与现有最先进的方法相比,我们提出的模型在测试集上的 AUROC(0.827)和 AUPR(0.831)方面具有更好的性能。此外,序列分析还为相应预测的表位提供了具有高精度的病毒类别结果。因此,该框架能够可靠地识别给定蛋白质序列的人类适应病毒的线性 B 细胞表位,并为未来可能的大流行和流行病提供帮助。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验