Stevens Danielle M, Yang David, Liang Tatiana J, Li Tianrun, Vega Brandon, Coaker Gitta L, Krasileva Ksenia
Plant and Microbial Biology, University of California, Berkeley, Berkeley CA 94720, USA.
Center for Computational Biology, University of California, Berkeley, Berkeley CA 94720, USA.
bioRxiv. 2025 Jul 15:2025.07.11.664399. doi: 10.1101/2025.07.11.664399.
Eukaryotes detect biomolecules through surface-localized receptors, key signaling components. A subset of receptors survey for pathogens, induce immunity, and restrict pathogen growth. Comparative genomics of both hosts and pathogens has unveiled vast sequence variation in receptors and potential ligands, creating an experimental bottleneck. We have developed mamp-ml, a machine learning framework for predicting plant receptor-ligand interactions. We leveraged existing functional data from over two decades of foundational research, together with the large protein language model ESM-2, to build a pipeline and model that predicts immunogenic outcomes using a combination of receptor-ligand features. Our model achieves 73% prediction accuracy on a held-out test set, even when an experimental structure is lacking. Our approach enables high-throughput screening of LRR receptor-ligand combinations and provides a computational framework for engineering plant immune systems.
真核生物通过表面定位的受体(关键信号成分)来检测生物分子。一部分受体负责监测病原体、诱导免疫反应并限制病原体生长。宿主和病原体的比较基因组学揭示了受体和潜在配体中存在巨大的序列变异,这造成了实验瓶颈。我们开发了mamp-ml,这是一个用于预测植物受体-配体相互作用的机器学习框架。我们利用了二十多年基础研究中的现有功能数据,结合大型蛋白质语言模型ESM-2,构建了一个管道和模型,该模型使用受体-配体特征的组合来预测免疫原性结果。即使在缺乏实验结构的情况下,我们的模型在保留测试集上的预测准确率仍达到73%。我们的方法能够对LRR受体-配体组合进行高通量筛选,并为设计植物免疫系统提供了一个计算框架。