Suppr超能文献

蛋白质结构编码和序列嵌入在转运蛋白底物预测中的应用。

Application of Protein Structure Encodings and Sequence Embeddings for Transporter Substrate Prediction.

作者信息

Denger Andreas, Helms Volkhard

机构信息

Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.

出版信息

Molecules. 2025 Aug 1;30(15):3226. doi: 10.3390/molecules30153226.

Abstract

Membrane transporters play a crucial role in any cell. Identifying the substrates they translocate across membranes is important for many fields of research, such as metabolomics, pharmacology, and biotechnology. In this study, we leverage recent advances in deep learning, such as amino acid sequence embeddings with protein language models (pLMs), highly accurate 3D structure predictions with AlphaFold 2, and structure-encoding 3Di sequences from FoldSeek, for predicting substrates of membrane transporters. We test new deep learning features derived from both sequence and structure, and compare them to the previously best-performing protein encodings, which were made up of amino acid k-mer frequencies and evolutionary information from PSSMs. Furthermore, we compare the performance of these features either using a previously developed SVM model, or with a regularized feedforward neural network (FNN). When evaluating these models on sugar and amino acid carriers in , as well as on three types of ion channels in human, we found that both the DL-based features and the FNN model led to a better and more consistent classification performance compared to previous methods. Direct encodings of 3D structures with Foldseek, as well as structural embeddings with ProstT5, matched the performance of state-of-the-art amino acid sequence embeddings calculated with the ProtT5-XL model when used as input for the FNN classifier.

摘要

膜转运蛋白在任何细胞中都起着至关重要的作用。确定它们跨膜转运的底物对于许多研究领域都很重要,如代谢组学、药理学和生物技术。在本研究中,我们利用深度学习的最新进展,如使用蛋白质语言模型(pLMs)进行氨基酸序列嵌入、使用AlphaFold 2进行高精度3D结构预测以及使用FoldSeek生成结构编码的3Di序列,来预测膜转运蛋白的底物。我们测试了从序列和结构中衍生出的新的深度学习特征,并将它们与之前表现最佳的蛋白质编码进行比较,后者由氨基酸k-mer频率和来自位置特异性得分矩阵(PSSMs)的进化信息组成。此外,我们使用之前开发的支持向量机(SVM)模型或正则化前馈神经网络(FNN)来比较这些特征的性能。当在[具体研究对象]中的糖和氨基酸载体以及人类的三种离子通道上评估这些模型时,我们发现与之前的方法相比,基于深度学习的特征和FNN模型都带来了更好且更一致的分类性能。当用作FNN分类器的输入时,使用Foldseek对3D结构进行直接编码以及使用ProstT5进行结构嵌入,与使用ProtT5-XL模型计算的最先进氨基酸序列嵌入的性能相匹配。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验