• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 Chou 的五步法则和不同的词嵌入类型来提高电子传输蛋白预测模型的性能。

Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1235-1244. doi: 10.1109/TCBB.2020.3010975. Epub 2022 Apr 1.

DOI:10.1109/TCBB.2020.3010975
PMID:32750894
Abstract

Living organisms receive necessary energy substances directly from cellular respiration. The completion of electron storage and transportation requires the process of cellular respiration with the aid of electron transport chains. Therefore, the work of deciphering electron transport proteins is inevitably needed. The identification of these proteins with high performance has a prompt dependence on the choice of methods for feature extraction and machine learning algorithm. In this study, protein sequences served as natural language sentences comprising words. The nominated word embedding-based feature sets, hinged on the word embedding modulation and protein motif frequencies, were useful for feature choosing. Five word embedding types and a variety of conjoint features were examined for such feature selection. The support vector machine algorithm consequentially was employed to perform classification. The performance statistics within the 5-fold cross-validation including average accuracy, specificity, sensitivity, as well as MCC rates surpass 0.95. Such metrics in the independent test are 96.82, 97.16, 95.76 percent, and 0.9, respectively. Compared to state-of-the-art predictors, the proposed method can generate more preferable performance above all metrics indicating the effectiveness of the proposed method in determining electron transport proteins. Furthermore, this study reveals insights about the applicability of various word embeddings for understanding surveyed sequences.

摘要

生物体直接从细胞呼吸中获得必要的能量物质。电子储存和运输的完成需要在电子传递链的辅助下进行细胞呼吸过程。因此,不可避免地需要破译电子传递蛋白的工作。这些具有高性能的蛋白质的鉴定迫切依赖于特征提取方法和机器学习算法的选择。在这项研究中,蛋白质序列被用作包含单词的自然语言句子。基于提名词嵌入的特征集,基于词嵌入调制和蛋白质基序频率,对于特征选择很有用。检查了五种词嵌入类型和多种联合特征,以进行这种特征选择。然后使用支持向量机算法进行分类。包括平均准确率、特异性、敏感性和 MCC 率在内的 5 倍交叉验证中的性能统计值均超过 0.95。独立测试中的这些指标分别为 96.82%、97.16%、95.76%和 0.9。与最先进的预测器相比,该方法在所有指标上都能产生更优的性能,表明该方法在确定电子传递蛋白方面的有效性。此外,这项研究揭示了各种词嵌入在理解所调查序列方面的适用性。

相似文献

1
Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model.使用 Chou 的五步法则和不同的词嵌入类型来提高电子传输蛋白预测模型的性能。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1235-1244. doi: 10.1109/TCBB.2020.3010975. Epub 2022 Apr 1.
2
Using Language Representation Learning Approach to Efficiently Identify Protein Complex Categories in Electron Transport Chain.利用语言表示学习方法高效识别电子传递链中的蛋白质复合物类别。
Mol Inform. 2020 Oct;39(10):e2000033. doi: 10.1002/minf.202000033. Epub 2020 Jul 16.
3
Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters.利用词嵌入技术有效地表示蛋白质序列,以识别转运蛋白的底物特异性。
Anal Biochem. 2019 Jul 15;577:73-81. doi: 10.1016/j.ab.2019.04.011. Epub 2019 Apr 22.
4
iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.iEnhancer-5Step:通过 Chou 的 5 步规则和词嵌入利用 DNA 序列的隐藏信息识别增强子。
Anal Biochem. 2019 Apr 15;571:53-61. doi: 10.1016/j.ab.2019.02.017. Epub 2019 Feb 26.
5
TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings.TNFPred:基于词嵌入的混合特征识别肿瘤坏死因子。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):155. doi: 10.1186/s12920-020-00779-w.
6
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.DPP-PseAAC:一种基于 Chou 的通用 PseAAC 的 DNA 结合蛋白预测模型。
J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.
7
Effective DNA binding protein prediction by using key features via Chou's general PseAAC.利用周元的通用 PseAAC 算法通过关键特征预测有效 DNA 结合蛋白。
J Theor Biol. 2019 Jan 7;460:64-78. doi: 10.1016/j.jtbi.2018.10.027. Epub 2018 Oct 11.
8
Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins.基于距离的 Top-n-gram 和随机森林在鉴定电子传递蛋白中的应用。
J Proteome Res. 2019 Jul 5;18(7):2931-2939. doi: 10.1021/acs.jproteome.9b00250. Epub 2019 Jun 3.
9
iN6-methylat (5-step): identifying DNA N-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule.iN6-methylat(5 步):使用 Chou 的 5 步规则通过连续核苷酸袋鉴定水稻基因组中的 DNA N6-甲基腺嘌呤位点。
Mol Genet Genomics. 2019 Oct;294(5):1173-1182. doi: 10.1007/s00438-019-01570-y. Epub 2019 May 4.
10
ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations.ActTRANS:基于迁移学习和上下文表示的主动转运蛋白的功能分类。
Comput Biol Chem. 2021 Aug;93:107537. doi: 10.1016/j.compbiolchem.2021.107537. Epub 2021 Jun 29.

引用本文的文献

1
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
2
Large-scale comparative review and assessment of computational methods for phage virion proteins identification.噬菌体病毒粒子蛋白质鉴定计算方法的大规模比较综述与评估
EXCLI J. 2022 Jan 3;21:11-29. doi: 10.17179/excli2021-4411. eCollection 2022.
3
A Multi-Scale and Multi-Level Fusion Approach for Deep Learning-Based Liver Lesion Diagnosis in Magnetic Resonance Images with Visual Explanation.
一种用于磁共振图像中基于深度学习的肝脏病变诊断的多尺度和多层次融合方法及可视化解释
Life (Basel). 2021 Jun 18;11(6):582. doi: 10.3390/life11060582.