Suppr超能文献

PhosBERT:一种用于识别 SARS-CoV-2 感染人类细胞中磷酸化位点的自监督学习模型。

PhosBERT: A self-supervised learning model for identifying phosphorylation sites in SARS-CoV-2-infected human cells.

机构信息

Sichuan Vocational College of Health and Rehabilitation, Zigong 643000, Sichuan, China.

The People's Hospital of Ya 'an, Ya'an 625000, Sichuan, China; The People's Hospital of Wenjiang Chengdu, Chengdu 611130, Sichuan, China.

出版信息

Methods. 2024 Oct;230:140-146. doi: 10.1016/j.ymeth.2024.08.004. Epub 2024 Aug 22.

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus, which mainly causes respiratory and enteric diseases and is responsible for the outbreak of coronavirus disease 19 (COVID-19). Numerous studies have demonstrated that SARS-CoV-2 infection will lead to a significant dysregulation of protein post-translational modification profile in human cells. The accurate recognition of phosphorylation sites in host cells will contribute to a deep understanding of the pathogenic mechanisms of SARS-CoV-2 and also help to screen drugs and compounds with antiviral potential. Therefore, there is a need to develop cost-effective and high-precision computational strategies for specifically identifying SARS-CoV-2-infected phosphorylation sites. In this work, we first implemented a custom neural network model (named PhosBERT) on the basis of a pre-trained protein language model of ProtBert, which was a self-supervised learning approach developed on the Bidirectional Encoder Representation from Transformers (BERT) architecture. PhosBERT was then trained and validated on serine (S) and threonine (T) phosphorylation dataset and tyrosine (Y) phosphorylation dataset with 5-fold cross-validation, respectively. Independent validation results showed that PhosBERT could identify S/T phosphorylation sites with high accuracy and AUC (area under the receiver operating characteristic) value of 81.9% and 0.896. The prediction accuracy and AUC value of Y phosphorylation sites reached up to 87.1% and 0.902. It indicated that the proposed model was of good prediction ability and stability and would provide a new approach for studying SARS-CoV-2 phosphorylation sites.

摘要

严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)是一种单链 RNA 病毒,主要引起呼吸道和肠道疾病,是导致 19 冠状病毒病(COVID-19)的病原体。大量研究表明,SARS-CoV-2 感染会导致人体细胞中蛋白质翻译后修饰谱的显著失调。准确识别宿主细胞中的磷酸化位点有助于深入了解 SARS-CoV-2 的致病机制,还有助于筛选具有抗病毒潜力的药物和化合物。因此,需要开发具有成本效益和高精度的计算策略,以专门识别 SARS-CoV-2 感染的磷酸化位点。在这项工作中,我们首先在 ProtBert 上实现了一个基于预训练蛋白质语言模型的定制神经网络模型(命名为 PhosBERT),这是一种基于 Transformer 架构的双向编码器表示(BERT)的自监督学习方法。PhosBERT 分别在丝氨酸(S)和苏氨酸(T)磷酸化数据集和酪氨酸(Y)磷酸化数据集上进行了 5 折交叉验证训练和验证。独立验证结果表明,PhosBERT 可以识别 S/T 磷酸化位点,具有较高的准确性和 AUC(接收器操作特征曲线下的面积)值,分别为 81.9%和 0.896。Y 磷酸化位点的预测准确性和 AUC 值高达 87.1%和 0.902。这表明所提出的模型具有良好的预测能力和稳定性,为研究 SARS-CoV-2 磷酸化位点提供了一种新方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验