• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MFSC:基于多投票的特征选择,通过采用 Chou 的 PseAAC 成分的通用形式对高尔基蛋白进行分类。

MFSC: Multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou's PseAAC components.

机构信息

Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.

Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.

出版信息

J Theor Biol. 2019 Feb 21;463:99-109. doi: 10.1016/j.jtbi.2018.12.017. Epub 2018 Dec 15.

DOI:10.1016/j.jtbi.2018.12.017
PMID:30562500
Abstract

Automatic identification of protein subcellular localization has gained much popularity in the last few decades. Subcellular localizations are useful in diagnosis of different diseases as well as in the process of drug development. Golgi is a vital type of protein, which provides means of transportation to several other proteins destined for lysosome, plasma membrane and secretion etc. Cis-Golgi and trans-Golgi are two ends of Golgi protein meant for reception and transmission of various substances. Dysfunction in Golgi proteins may lead to different types of diseases especially the inheritable and neurodegenerative problems. Due to the significance of Golgi proteins, it is indispensable to correctly identify the Golgi proteins. In this paper, a novel and high throughput computational model is proposed which can identify the subGolgi proteins precisely. Discrete and evolutionary feature extraction schemes are applied so that all the salient, noiseless, and relevant information from protein sequences could be captured. Unfortunately, the benchmark dataset publicly available is quite imbalance, where trans-Golgi sequences constitute 72% of the whole dataset that reflects biasness, redundancy, and lack of hypothesis generalization. In order to cover the limitations of imbalance data, Synthetic Minority over Sampling Technique is utilized to balance the number of instances in different classes of the dataset. In addition, a condense feature space is formed by fusing the high rank features of eleven different feature selection techniques. The high rank features are selected through majority voting algorithm; consequently, the feature space is reduced 85%. The experiential results demonstrate that kNN classifier obtained promising results in combination with hybrid feature space. It has yielded an accuracy of 98% in jackknife cross-validation, 94% in independent data and 96% in 10-fold cross-validation test. It is ascertained that the proposed model is reliable, consistent and serves as a valuable tool for the research community.

摘要

自动识别蛋白质亚细胞定位在过去几十年中得到了广泛的关注。亚细胞定位在不同疾病的诊断以及药物开发过程中都很有用。高尔基体是一种重要的蛋白质类型,它为几种其他蛋白质提供了运往溶酶体、质膜和分泌等目的地的运输途径。顺式高尔基体和顺式高尔基体是高尔基体蛋白的两个末端,用于接收和传输各种物质。高尔基体蛋白功能障碍可能导致多种疾病,特别是遗传性和神经退行性问题。由于高尔基体蛋白的重要性,正确识别高尔基体蛋白是必不可少的。在本文中,提出了一种新颖的、高通量的计算模型,可以精确识别亚高尔基体蛋白。应用离散和进化特征提取方案,以便从蛋白质序列中捕获所有显著的、无噪声的和相关的信息。不幸的是,公开可用的基准数据集非常不平衡,其中顺式高尔基体序列构成整个数据集的 72%,这反映了偏见、冗余和缺乏假设泛化。为了克服不平衡数据的局限性,利用合成少数过采样技术来平衡数据集不同类别的实例数量。此外,通过融合十一种不同特征选择技术的高等级特征来形成一个紧凑的特征空间。高等级特征通过多数投票算法选择;因此,特征空间减少了 85%。实验结果表明,kNN 分类器与混合特征空间相结合取得了很好的结果。在交叉验证中,它在 jackknife 交叉验证中获得了 98%的准确率,在独立数据中获得了 94%的准确率,在 10 倍交叉验证测试中获得了 96%的准确率。可以确定,所提出的模型是可靠的、一致的,并且是研究社区的有价值的工具。

相似文献

1
MFSC: Multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou's PseAAC components.MFSC:基于多投票的特征选择,通过采用 Chou 的 PseAAC 成分的通用形式对高尔基蛋白进行分类。
J Theor Biol. 2019 Feb 21;463:99-109. doi: 10.1016/j.jtbi.2018.12.017. Epub 2018 Dec 15.
2
isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection.isGPT:一种基于 SVM 和随机森林特征选择的亚高尔基体蛋白类型识别优化模型。
Artif Intell Med. 2018 Jan;84:90-100. doi: 10.1016/j.artmed.2017.11.003. Epub 2017 Nov 26.
3
A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.一种新型的特征提取方法,具有特征选择功能,可从不平衡数据中识别出高尔基驻留蛋白类型。
Int J Mol Sci. 2016 Feb 6;17(2):218. doi: 10.3390/ijms17020218.
4
Intelligent computational model for classification of sub-Golgi protein using oversampling and fisher feature selection methods.基于过采样和Fisher特征选择方法的高尔基体亚蛋白分类智能计算模型
Artif Intell Med. 2017 May;78:14-22. doi: 10.1016/j.artmed.2017.05.001. Epub 2017 May 10.
5
Prediction of Golgi-resident protein types using general form of Chou's pseudo-amino acid compositions: Approaches with minimal redundancy maximal relevance feature selection.基于周氏伪氨基酸组成的一般形式预测高尔基体驻留蛋白类型:采用最小冗余最大相关特征选择的方法
J Theor Biol. 2016 Aug 7;402:38-44. doi: 10.1016/j.jtbi.2016.04.032. Epub 2016 May 4.
6
Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC.基于过采样方法和周式广义伪氨基酸组成预测蛋白质亚细胞定位
J Theor Biol. 2018 Jan 21;437:239-250. doi: 10.1016/j.jtbi.2017.10.030. Epub 2017 Oct 31.
7
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.DPP-PseAAC:一种基于 Chou 的通用 PseAAC 的 DNA 结合蛋白预测模型。
J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.
8
Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou's General Pseudo Amino Acid Composition.通过将二肽组成纳入周氏广义伪氨基酸组成来预测蛋白质的亚线粒体定位
J Membr Biol. 2016 Jun;249(3):293-304. doi: 10.1007/s00232-015-9868-8. Epub 2016 Jan 8.
9
iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition.iMem-2LSAAC:一种通过将SAAC概念扩展到周氏伪氨基酸组成来区分膜蛋白及其类型的两级模型。
J Theor Biol. 2018 Apr 7;442:11-21. doi: 10.1016/j.jtbi.2018.01.008. Epub 2018 Jan 11.
10
Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou's PseAAC.通过将自相关和 PSSM 整合到 Chou 的 PseAAC 中,预测细胞凋亡蛋白的亚细胞定位。
J Theor Biol. 2018 Nov 14;457:163-169. doi: 10.1016/j.jtbi.2018.08.042. Epub 2018 Sep 1.

引用本文的文献

1
GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion.GASIDN:具有多尺度特征融合的亚高尔基体蛋白鉴定。
BMC Genomics. 2024 Oct 30;25(1):1019. doi: 10.1186/s12864-024-10954-3.
2
Identification of plant vacuole proteins by using graph neural network and contact maps.利用图神经网络和接触图鉴定植物液泡蛋白。
BMC Bioinformatics. 2023 Sep 22;24(1):357. doi: 10.1186/s12859-023-05475-x.
3
Identification of intelligence-related proteins through a robust two-layer predictor.通过强大的双层预测器鉴定与智力相关的蛋白质。
Commun Integr Biol. 2022 Nov 15;15(1):253-264. doi: 10.1080/19420889.2022.2143101. eCollection 2022.
4
iTAGPred: A Two-Level Prediction Model for Identification of Angiogenesis and Tumor Angiogenesis Biomarkers.iTAGPred:一种用于识别血管生成和肿瘤血管生成生物标志物的两级预测模型。
Appl Bionics Biomech. 2021 Sep 27;2021:2803147. doi: 10.1155/2021/2803147. eCollection 2021.
5
Identification of sub-Golgi protein localization by use of deep representation learning features.利用深度表征学习特征鉴定高尔基体亚结构蛋白定位
Bioinformatics. 2021 Apr 5;36(24):5600-5609. doi: 10.1093/bioinformatics/btaa1074.
6
Variable selection from a feature representing protein sequences: a case of classification on bacterial type IV secreted effectors.基于蛋白质序列特征的变量选择:以 IV 型细菌分泌效应子分类为例。
BMC Bioinformatics. 2020 Oct 27;21(1):480. doi: 10.1186/s12859-020-03826-6.
7
Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction the Chou's 5-steps Rule and General Pseudo Components.使用双轮廓贝叶斯特征提取、周氏五步法则和广义伪组分鉴定赖氨酸N-乙酰化位点。
Curr Genomics. 2019 Dec;20(8):592-601. doi: 10.2174/1389202921666191223154629.
8
iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments Chou's 5-steps Rule and Pseudo Components.iSulfoTyr-PseAAC:通过结合统计矩、周氏五步法则和伪组分来识别酪氨酸硫酸化位点
Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609.
9
Some illuminating remarks on molecular genetics and genomics as well as drug development.关于分子遗传学和基因组学以及药物开发的一些有启发性的观点。
Mol Genet Genomics. 2020 Mar;295(2):261-274. doi: 10.1007/s00438-019-01634-z. Epub 2020 Jan 1.
10
RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule.RAACBook:一个基于简化氨基酸字母表的网络服务器,用于通过使用周保罗的五步法则进行序列相关推断。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz131.