• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用词嵌入技术有效地表示蛋白质序列,以识别转运蛋白的底物特异性。

Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters.

机构信息

Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan.

School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore, 6397983.

出版信息

Anal Biochem. 2019 Jul 15;577:73-81. doi: 10.1016/j.ab.2019.04.011. Epub 2019 Apr 22.

DOI:10.1016/j.ab.2019.04.011
PMID:31022378
Abstract

Membrane transport proteins and their substrate specificities play crucial roles in various cellular functions. Identifying the substrate specificities of membrane transport proteins is closely related to protein-target interaction prediction, drug design, membrane recruitment, and dysregulation analysis, thus being an important problem for bioinformatics researchers. In this study, we applied word embedding approach, the main cause for natural language processing breakout in recent years, to protein sequences of transporters. We defined each protein sequence based on the word embeddings and frequencies of its biological words. The protein features were then fed into machine learning models for prediction. We also varied the lengths of protein sequence's constituent biological words to find the optimal length which generated the most discriminative feature set. Compared to four other feature types created from protein sequences, our proposed features can help prediction models yield superior performance. Our best models reach an average area under the curve of 0.96 and 0.99, respectively on the 5-fold cross validation and the independent test. With this result, our study can help biologists identify transporters based on substrate specificities as well as provides a basis for further research that enriches a field of applying natural language processing techniques in bioinformatics.

摘要

膜转运蛋白及其底物特异性在各种细胞功能中起着至关重要的作用。鉴定膜转运蛋白的底物特异性与蛋白质-靶相互作用预测、药物设计、膜募集和失调分析密切相关,因此是生物信息学研究人员的一个重要问题。在这项研究中,我们将自然语言处理近年来取得突破的主要方法——词嵌入方法应用于转运蛋白的蛋白质序列。我们根据词嵌入和生物词的频率来定义每个蛋白质序列。然后,将蛋白质特征输入机器学习模型进行预测。我们还改变了蛋白质序列组成生物词的长度,以找到产生最具区分性特征集的最佳长度。与从蛋白质序列创建的其他四种特征类型相比,我们提出的特征可以帮助预测模型产生更好的性能。我们的最佳模型在 5 折交叉验证和独立测试中的平均曲线下面积分别达到 0.96 和 0.99。有了这个结果,我们的研究可以帮助生物学家根据底物特异性来识别转运蛋白,并为进一步研究提供基础,丰富了将自然语言处理技术应用于生物信息学的领域。

相似文献

1
Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters.利用词嵌入技术有效地表示蛋白质序列,以识别转运蛋白的底物特异性。
Anal Biochem. 2019 Jul 15;577:73-81. doi: 10.1016/j.ab.2019.04.011. Epub 2019 Apr 22.
2
TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings.TNFPred:基于词嵌入的混合特征识别肿瘤坏死因子。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):155. doi: 10.1186/s12920-020-00779-w.
3
Prediction of membrane transport proteins and their substrate specificities using primary sequence information.利用一级序列信息预测膜转运蛋白及其底物特异性。
PLoS One. 2014 Jun 26;9(6):e100278. doi: 10.1371/journal.pone.0100278. eCollection 2014.
4
Using Language Representation Learning Approach to Efficiently Identify Protein Complex Categories in Electron Transport Chain.利用语言表示学习方法高效识别电子传递链中的蛋白质复合物类别。
Mol Inform. 2020 Oct;39(10):e2000033. doi: 10.1002/minf.202000033. Epub 2020 Jul 16.
5
Prediction the Substrate Specificities of Membrane Transport Proteins Based on Support Vector Machine and Hybrid Features.基于支持向量机和混合特征预测膜转运蛋白的底物特异性
IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):947-953. doi: 10.1109/TCBB.2015.2495140. Epub 2015 Nov 11.
6
Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model.使用 Chou 的五步法则和不同的词嵌入类型来提高电子传输蛋白预测模型的性能。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1235-1244. doi: 10.1109/TCBB.2020.3010975. Epub 2022 Apr 1.
7
ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations.ActTRANS:基于迁移学习和上下文表示的主动转运蛋白的功能分类。
Comput Biol Chem. 2021 Aug;93:107537. doi: 10.1016/j.compbiolchem.2021.107537. Epub 2021 Jun 29.
8
iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.iEnhancer-5Step:通过 Chou 的 5 步规则和词嵌入利用 DNA 序列的隐藏信息识别增强子。
Anal Biochem. 2019 Apr 15;571:53-61. doi: 10.1016/j.ab.2019.02.017. Epub 2019 Feb 26.
9
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.
10
Molecular Properties of Drugs Interacting with SLC22 Transporters OAT1, OAT3, OCT1, and OCT2: A Machine-Learning Approach.与SLC22转运蛋白OAT1、OAT3、OCT1和OCT2相互作用的药物的分子特性:一种机器学习方法
J Pharmacol Exp Ther. 2016 Oct;359(1):215-29. doi: 10.1124/jpet.116.232660. Epub 2016 Aug 3.

引用本文的文献

1
Deep learning neural network development for the classification of bacteriocin sequences produced by lactic acid bacteria.用于乳酸菌产生的细菌素序列分类的深度学习神经网络开发
F1000Res. 2025 Jun 20;13:981. doi: 10.12688/f1000research.154432.2. eCollection 2024.
2
Fungi-Kcr: a language model for predicting lysine crotonylation in pathogenic fungal proteins.真菌Kcr:一种用于预测致病真菌蛋白质中赖氨酸巴豆酰化的语言模型。
Front Cell Infect Microbiol. 2025 Jul 15;15:1615443. doi: 10.3389/fcimb.2025.1615443. eCollection 2025.
3
NA_mCNN: Classification of Sodium Transporters in Membrane Proteins by Integrating Multi-Window Deep Learning and ProtTrans for Their Therapeutic Potential.
NA_mCNN:通过整合多窗口深度学习和ProtTrans对膜蛋白中的钠转运体进行分类以挖掘其治疗潜力
J Proteome Res. 2025 May 2;24(5):2324-2335. doi: 10.1021/acs.jproteome.4c00884. Epub 2025 Apr 7.
4
Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks.增强膜转运蛋白的鉴定:结合 ProtBERT-BFD 和卷积神经网络的混合方法。
J Integr Bioinform. 2023 Jul 28;20(2). doi: 10.1515/jib-2022-0055. eCollection 2023 Jun 1.
5
ISTRF: Identification of sucrose transporter using random forest.ISTRF:利用随机森林鉴定蔗糖转运蛋白
Front Genet. 2022 Sep 12;13:1012828. doi: 10.3389/fgene.2022.1012828. eCollection 2022.
6
Representation learning applications in biological sequence analysis.生物序列分析中的表示学习应用。
Comput Struct Biotechnol J. 2021 May 23;19:3198-3208. doi: 10.1016/j.csbj.2021.05.039. eCollection 2021.
7
TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings.TNFPred:基于词嵌入的混合特征识别肿瘤坏死因子。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):155. doi: 10.1186/s12920-020-00779-w.
8
TooT-T: discrimination of transport proteins from non-transport proteins.TooT-T:区分转运蛋白和非转运蛋白。
BMC Bioinformatics. 2020 Apr 23;21(Suppl 3):25. doi: 10.1186/s12859-019-3311-6.
9
Trader as a new optimization algorithm predicts drug-target interactions efficiently.交易员算法作为一种新的优化算法能够有效地预测药物-靶标相互作用。
Sci Rep. 2019 Jun 27;9(1):9348. doi: 10.1038/s41598-019-45814-8.