Denger Andreas, Helms Volkhard
Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.
Molecules. 2025 Aug 1;30(15):3226. doi: 10.3390/molecules30153226.
Membrane transporters play a crucial role in any cell. Identifying the substrates they translocate across membranes is important for many fields of research, such as metabolomics, pharmacology, and biotechnology. In this study, we leverage recent advances in deep learning, such as amino acid sequence embeddings with protein language models (pLMs), highly accurate 3D structure predictions with AlphaFold 2, and structure-encoding 3Di sequences from FoldSeek, for predicting substrates of membrane transporters. We test new deep learning features derived from both sequence and structure, and compare them to the previously best-performing protein encodings, which were made up of amino acid k-mer frequencies and evolutionary information from PSSMs. Furthermore, we compare the performance of these features either using a previously developed SVM model, or with a regularized feedforward neural network (FNN). When evaluating these models on sugar and amino acid carriers in , as well as on three types of ion channels in human, we found that both the DL-based features and the FNN model led to a better and more consistent classification performance compared to previous methods. Direct encodings of 3D structures with Foldseek, as well as structural embeddings with ProstT5, matched the performance of state-of-the-art amino acid sequence embeddings calculated with the ProtT5-XL model when used as input for the FNN classifier.
膜转运蛋白在任何细胞中都起着至关重要的作用。确定它们跨膜转运的底物对于许多研究领域都很重要,如代谢组学、药理学和生物技术。在本研究中,我们利用深度学习的最新进展,如使用蛋白质语言模型(pLMs)进行氨基酸序列嵌入、使用AlphaFold 2进行高精度3D结构预测以及使用FoldSeek生成结构编码的3Di序列,来预测膜转运蛋白的底物。我们测试了从序列和结构中衍生出的新的深度学习特征,并将它们与之前表现最佳的蛋白质编码进行比较,后者由氨基酸k-mer频率和来自位置特异性得分矩阵(PSSMs)的进化信息组成。此外,我们使用之前开发的支持向量机(SVM)模型或正则化前馈神经网络(FNN)来比较这些特征的性能。当在[具体研究对象]中的糖和氨基酸载体以及人类的三种离子通道上评估这些模型时,我们发现与之前的方法相比,基于深度学习的特征和FNN模型都带来了更好且更一致的分类性能。当用作FNN分类器的输入时,使用Foldseek对3D结构进行直接编码以及使用ProstT5进行结构嵌入,与使用ProtT5-XL模型计算的最先进氨基酸序列嵌入的性能相匹配。