• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

区分嗜温蛋白和嗜热蛋白的多样性测度相似距离。

A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins.

机构信息

School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.

出版信息

Amino Acids. 2013 Feb;44(2):573-80. doi: 10.1007/s00726-012-1374-z. Epub 2012 Aug 1.

DOI:10.1007/s00726-012-1374-z
PMID:22851052
Abstract

The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .

摘要

成功预测嗜热蛋白对于设计在高温下具有功能的稳定酶非常有用。我们使用增量多样性(ID),一种新颖的基于氨基酸组成的相似性距离,在 2 类 K-最近邻分类器中对嗜热蛋白和嗜中温蛋白进行分类。并且成功开发了 KNN-ID 分类器来预测嗜热蛋白。与之前的方法不同,我们的方法不是从蛋白质序列中提取特征,而是基于符号序列的多样性度量。首先计算每对蛋白质序列之间的相似距离,以定量测量给定序列与其他序列的相似程度。然后使用 K-最近邻算法确定查询蛋白质。与多个最近发表的方法进行比较表明,本研究中提出的 KNN-ID 优于其他方法。改进的预测性能表明,它是一种用于区分嗜热蛋白和嗜中温蛋白的简单有效的分类器。最后,进一步讨论了蛋白质长度和蛋白质同一性对预测准确性的影响。本文中使用的预测模型和数据集可从 http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm 免费下载。

相似文献

1
A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins.区分嗜温蛋白和嗜热蛋白的多样性测度相似距离。
Amino Acids. 2013 Feb;44(2):573-80. doi: 10.1007/s00726-012-1374-z. Epub 2012 Aug 1.
2
Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types.使用优化的证据理论K近邻分类器和伪氨基酸组成来预测膜蛋白类型。
Biochem Biophys Res Commun. 2005 Aug 19;334(1):288-92. doi: 10.1016/j.bbrc.2005.06.087.
3
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc:一种用于预测人类蛋白质亚细胞定位的新型集成分类器。
Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.
4
Prediction of protein structural class using a complexity-based distance measure.基于复杂度的距离度量预测蛋白质结构类别。
Amino Acids. 2010 Mar;38(3):721-8. doi: 10.1007/s00726-009-0276-1. Epub 2009 Mar 28.
5
NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities.NdPASA:一种整合了邻域依赖氨基酸倾向的新型双序列蛋白质序列比对算法。
Proteins. 2005 Feb 15;58(3):628-37. doi: 10.1002/prot.20359.
6
Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure.基于总多样性测度的相似性距离,采用K近邻法鉴别膜转运蛋白类型。
Mol Biosyst. 2015 Mar;11(3):950-7. doi: 10.1039/c4mb00681j. Epub 2015 Jan 21.
7
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
8
LogitBoost classifier for discriminating thermophilic and mesophilic proteins.用于区分嗜热蛋白和嗜温蛋白的LogitBoost分类器。
J Biotechnol. 2007 Jan 10;127(3):417-24. doi: 10.1016/j.jbiotec.2006.07.020. Epub 2006 Aug 1.
9
Cysteine separations profiles on protein sequences infer disulfide connectivity.蛋白质序列上的半胱氨酸分离图谱可推断二硫键连接情况。
Bioinformatics. 2005 Apr 15;21(8):1415-20. doi: 10.1093/bioinformatics/bti179. Epub 2004 Dec 7.
10
Protein superfamily classification using fuzzy rule-based classifier.使用基于模糊规则的分类器进行蛋白质超家族分类。
IEEE Trans Nanobioscience. 2009 Mar;8(1):92-9. doi: 10.1109/TNB.2009.2016484. Epub 2009 Mar 21.

引用本文的文献

1
An increment of diversity method for cell state trajectory inference of time-series scRNA-seq data.一种用于时间序列单细胞RNA测序数据细胞状态轨迹推断的多样性增量方法。
Fundam Res. 2024 Feb 9;4(4):770-776. doi: 10.1016/j.fmre.2024.01.020. eCollection 2024 Jul.
2
Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins.用于预测和分析嗜热蛋白的基于机器学习的预测器的实证比较与分析
EXCLI J. 2022 Mar 2;21:554-570. doi: 10.17179/excli2022-4723. eCollection 2022.
3
iThermo: A Sequence-Based Model for Identifying Thermophilic Proteins Using a Multi-Feature Fusion Strategy.
iThermo:一种基于序列的模型,用于使用多特征融合策略识别嗜热蛋白。
Front Microbiol. 2022 Feb 22;13:790063. doi: 10.3389/fmicb.2022.790063. eCollection 2022.
4
A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides.一种新的基于序列的预测器,用于使用二肽的估计倾向分数来识别和描述嗜热蛋白。
Sci Rep. 2021 Dec 10;11(1):23782. doi: 10.1038/s41598-021-03293-w.
5
IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy.IHEC\_RAAC:一种通过简化氨基酸簇策略来鉴定人类酶类的在线平台。
Amino Acids. 2021 Feb;53(2):239-251. doi: 10.1007/s00726-021-02941-9. Epub 2021 Jan 23.
6
Changing relative risk of clinical factors for hospital-acquired acute kidney injury across age groups: a retrospective cohort study.临床因素与医院获得性急性肾损伤相关性风险的变化:一项回顾性队列研究。
BMC Nephrol. 2020 Aug 2;21(1):321. doi: 10.1186/s12882-020-01980-w.
7
A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features.一种基于简化氨基酸和混合特征的嗜热蛋白预测方法。
Front Bioeng Biotechnol. 2020 May 5;8:285. doi: 10.3389/fbioe.2020.00285. eCollection 2020.
8
iDEF-PseRAAC: Identifying the Defensin Peptide by Using Reduced Amino Acid Composition Descriptor.iDEF-PseRAAC:利用简化氨基酸组成描述符鉴定防御素肽
Evol Bioinform Online. 2019 Jul 31;15:1176934319867088. doi: 10.1177/1176934319867088. eCollection 2019.
9
Self-Weighted Multi-Kernel Multi-Label Learning for Potential miRNA-Disease Association Prediction.用于潜在miRNA-疾病关联预测的自加权多核多标签学习
Mol Ther Nucleic Acids. 2019 Sep 6;17:414-423. doi: 10.1016/j.omtn.2019.06.014. Epub 2019 Jun 28.
10
Sequence-based predictive modeling to identify cancerlectins.基于序列的预测建模以识别癌凝集素。
Oncotarget. 2017 Apr 25;8(17):28169-28175. doi: 10.18632/oncotarget.15963.