• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用蛋白质语言模型对病毒前体SARS-CoV-2宿主因子进行反式因子预测。

TransFactor-prediction of pro-viral SARS-CoV-2 host factors using a protein language model.

作者信息

An Yang, Bergant Valter, Firmani Samuele, Grünke Corinna, Bonnal Batiste, Henrici Alexander, Pichlmair Andreas, Schubert Benjamin, Marsico Annalisa

机构信息

Computational Health Center, Helmholtz Center Munich, Neuherberg 85764, Germany.

School of Computation, Information and Technology, Technical University of Munich, Munich 80333, Germany.

出版信息

Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf491.

DOI:10.1093/bioinformatics/btaf491
PMID:40929136
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12449051/
Abstract

MOTIVATION

Recent pandemics have revealed significant gaps in our understanding of viral pathogenesis, exposing an urgent need for methods to identify and prioritize key host proteins (host factors) as potential targets for antiviral treatments. De novo generation of experimental datasets is limited by their heterogeneity, and for looming future pandemics, may not be feasible due to limitations of experimental approaches.

RESULTS

Here, we present TransFactor, a computational framework for predicting and prioritizing candidate host factors using only protein sequence data. It leverages the pre-trained ESM-2 protein language model, fine-tuned on a limited set of experimentally determined host factors aggregated from 33 independent SARS-CoV-2 studies. TransFactor outperforms machine and deep learning baselines and its predictions align with Gene Ontology enrichments of known host factors, but also provide interpretability through a computational alanine scan, enabling the identification of pro-viral protein domains such as COMM, PX, and RRM, that may be used to direct experimental investigations of virus biology and guide rational design of antiviral therapies. Our findings demonstrate the potential of transformer-based models to advance host factor prediction, providing a framework extendable to orthogonal input modalities and other infectious diseases, enhancing our preparedness for current and future viral threats.

AVAILABILITY AND IMPLEMENTATION

Source code is available at https://github.com/marsico-lab/TransFactor. A full reproducibility package, including code, trained models, and data, is archived on Zenodo (https://doi.org/10.5281/zenodo.16793684).

摘要

动机

近期的大流行暴露出我们在病毒发病机制理解方面存在重大差距,凸显了迫切需要方法来识别关键宿主蛋白(宿主因子)并将其作为抗病毒治疗的潜在靶点进行优先级排序。从头生成实验数据集受到其异质性的限制,对于即将到来的未来大流行,由于实验方法的局限性,可能不可行。

结果

在此,我们提出了TransFactor,这是一个仅使用蛋白质序列数据来预测和优先排序候选宿主因子的计算框架。它利用了预训练的ESM-2蛋白质语言模型,并在从33项独立的SARS-CoV-2研究汇总的有限实验确定的宿主因子集上进行了微调。TransFactor优于机器学习和深度学习基线,其预测与已知宿主因子的基因本体富集一致,还通过计算丙氨酸扫描提供可解释性,从而能够识别可能用于指导病毒生物学实验研究和指导抗病毒疗法合理设计的病毒蛋白结构域,如COMM、PX和RRM。我们的研究结果证明了基于Transformer的模型在推进宿主因子预测方面的潜力,提供了一个可扩展到正交输入模式和其他传染病的框架,增强了我们对当前和未来病毒威胁的应对能力。

可用性和实现方式

源代码可在https://github.com/marsico-lab/TransFactor获取。一个完整的可重现包,包括代码、训练好的模型和数据,已存档于Zenodo(https://doi.org/10.5281/zenodo.16793684)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70f8/12449051/a538e444eb94/btaf491f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70f8/12449051/21946daf010f/btaf491f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70f8/12449051/3a0ea6355d87/btaf491f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70f8/12449051/a538e444eb94/btaf491f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70f8/12449051/21946daf010f/btaf491f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70f8/12449051/3a0ea6355d87/btaf491f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70f8/12449051/a538e444eb94/btaf491f3.jpg

相似文献

1
TransFactor-prediction of pro-viral SARS-CoV-2 host factors using a protein language model.利用蛋白质语言模型对病毒前体SARS-CoV-2宿主因子进行反式因子预测。
Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf491.
2
Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.针对四个地区的地方、区域和国家孕产妇保健系统的疫情后规划:一项混合方法研究。
Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Non-pharmacological measures implemented in the setting of long-term care facilities to prevent SARS-CoV-2 infections and their consequences: a rapid review.长期护理机构中实施的非药物措施以预防 SARS-CoV-2 感染及其后果:快速综述。
Cochrane Database Syst Rev. 2021 Sep 15;9(9):CD015085. doi: 10.1002/14651858.CD015085.pub2.
6
Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.用于 SARS-CoV-2 感染诊断的快速、即时抗原检测。
Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.
7
Antibody tests for identification of current and past infection with SARS-CoV-2.抗体检测用于鉴定 SARS-CoV-2 的现症感染和既往感染。
Cochrane Database Syst Rev. 2022 Nov 17;11(11):CD013652. doi: 10.1002/14651858.CD013652.pub2.
8
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染鉴定的影响。
Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.
9
Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。
Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.
10
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.

本文引用的文献

1
SARITA: a large language model for generating the S1 subunit of the SARS-CoV-2 spike protein.SARITA:一种用于生成严重急性呼吸综合征冠状病毒2刺突蛋白S1亚基的大语言模型。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf384.
2
Generative prediction of real-world prevalent SARS-CoV-2 mutation with in silico virus evolution.基于计算机模拟病毒进化对现实世界中流行的SARS-CoV-2突变进行生成式预测。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf276.
3
Computationally designed proteins mimic antibody immune evasion in viral evolution.
通过计算设计的蛋白质在病毒进化过程中模拟抗体免疫逃逸。
Immunity. 2025 Jun 10;58(6):1411-1421.e6. doi: 10.1016/j.immuni.2025.04.015. Epub 2025 May 8.
4
UniProt: the Universal Protein Knowledgebase in 2025.通用蛋白质知识库(UniProt):2025年的情况
Nucleic Acids Res. 2025 Jan 6;53(D1):D609-D617. doi: 10.1093/nar/gkae1010.
5
Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.利用深度自编码器的异常检测预测 SARS-CoV-2 谱系的优势度。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae535.
6
Machine Learning Early Detection of SARS-CoV-2 High-Risk Variants.机器学习早期检测新冠病毒高风险变异株
Adv Sci (Weinh). 2024 Dec;11(45):e2405058. doi: 10.1002/advs.202405058. Epub 2024 Oct 14.
7
Fine-tuning protein language models boosts predictions across diverse tasks.微调蛋白质语言模型可提高跨多种任务的预测能力。
Nat Commun. 2024 Aug 28;15(1):7407. doi: 10.1038/s41467-024-51844-2.
8
Multi-omics characterization of the monkeypox virus infection.猴痘病毒感染的多组学特征分析
Nat Commun. 2024 Aug 8;15(1):6778. doi: 10.1038/s41467-024-51074-6.
9
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
10
The EMBL-EBI Job Dispatcher sequence analysis tools framework in 2024.2024 年 EMBL-EBI 作业调度程序序列分析工具框架
Nucleic Acids Res. 2024 Jul 5;52(W1):W521-W525. doi: 10.1093/nar/gkae241.