• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

费马斯特:一种用于蛋白质序列比较和DNA结合蛋白识别的新型数字表示法。

FermatS: A Novel Numerical Representation for Protein Sequence Comparison and DNA-binding Protein Identification.

作者信息

Zhang Yanping, Gao Ya, Ni Jianwei, Chen Pengcheng, Wang Xiaosheng

机构信息

School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan 056038, China.

出版信息

Comb Chem High Throughput Screen. 2021;24(10):1746-1753. doi: 10.2174/1386207323999201117111738.

DOI:10.2174/1386207323999201117111738
PMID:33208064
Abstract

AIMS

Based on protein sequence information, a simple and effective method was used to analyze protein sequence similarity and predict DNA-binding protein.

BACKGROUND

It is absolutely necessary that we generate computational methods of low complexity to accurate infer protein structure, function, and evolution in the rapidly growing number of molecular biology data available.

OBJECTIVE

It is important to generate novel computational algorithms for analyzing and comparing protein sequences with the rapidly growing number of molecular biology data available.

METHODS

Based on global and local position representation with the curves of Fermat spiral and normalized moments of inertia of the curve of Fermat spiral, respectively, moreover, composition of 20 amino acids to get the numerical characteristics of protein sequences.

RESULTS

It has been applied to analyze the similarity/dissimilarity of nine ND5 proteins, the analysis results are consistent with the biological evolution theory. Furthermore, we employ the Logistic regression with 5-fold cross-validation to establish the prediction of DNA-binding proteins model, which outperformed the DNAbinder, iDNA-prot, DNA-prot and gDNA-prot by 0.0069-0.609 in terms of F-measure, 0.293-0.898 in terms of MCC in unbalanced dataset.

CONCLUSION

These results show that our method, namely FermatS, is effective to compare, recognition and prediction the protein sequences.

摘要

目的

基于蛋白质序列信息,采用一种简单有效的方法分析蛋白质序列相似性并预测DNA结合蛋白。

背景

鉴于现有分子生物学数据数量迅速增长,生成低复杂度的计算方法以准确推断蛋白质结构、功能和进化是绝对必要的。

目的

鉴于现有分子生物学数据数量迅速增长,生成用于分析和比较蛋白质序列的新型计算算法很重要。

方法

分别基于费马螺旋曲线的全局和局部位置表示以及费马螺旋曲线的归一化惯性矩,此外,采用20种氨基酸的组成来获取蛋白质序列的数值特征。

结果

该方法已应用于分析9种ND5蛋白的相似性/差异性,分析结果与生物进化理论一致。此外,我们采用5折交叉验证的逻辑回归建立DNA结合蛋白预测模型,在不平衡数据集中,该模型在F值方面比DNAbinder、iDNA-prot、DNA-prot和gDNA-prot高出0.0069 - 0.609,在马修斯相关系数方面高出0.293 - 0.898。

结论

这些结果表明我们的方法,即FermatS,在比较、识别和预测蛋白质序列方面是有效的。

相似文献

1
FermatS: A Novel Numerical Representation for Protein Sequence Comparison and DNA-binding Protein Identification.费马斯特:一种用于蛋白质序列比较和DNA结合蛋白识别的新型数字表示法。
Comb Chem High Throughput Screen. 2021;24(10):1746-1753. doi: 10.2174/1386207323999201117111738.
2
gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence.gDNA-Prot:利用支持向量机和蛋白质序列的新型数值表征预测DNA结合蛋白。
J Theor Biol. 2016 Oct 7;406:8-16. doi: 10.1016/j.jtbi.2016.06.002. Epub 2016 Jul 1.
3
Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation.基于广义伪氨基酸组成和图形表示法的蛋白质序列比较及DNA结合蛋白鉴定
Comb Chem High Throughput Screen. 2018;21(2):100-110. doi: 10.2174/1386207321666180130100838.
4
iDNA-Prot: identification of DNA binding proteins using random forest with grey model.iDNA-Prot:基于随机森林和灰色模型识别 DNA 结合蛋白。
PLoS One. 2011;6(9):e24756. doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.
5
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。
PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.
6
One novel representation of DNA sequence based on the global and local position information.基于全局和局部位置信息的 DNA 序列的一种新表示。
Sci Rep. 2018 May 15;8(1):7592. doi: 10.1038/s41598-018-26005-3.
7
enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.enDNA-Prot:通过应用集成学习识别DNA结合蛋白。
Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.
8
DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest.DNA-Prot:利用随机森林从蛋白质序列信息中识别DNA结合蛋白。
J Biomol Struct Dyn. 2009 Jun;26(6):679-86. doi: 10.1080/07391102.2009.10507281.
9
newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation.新型DNA-蛋白质:利用支持向量机和综合序列表示法预测DNA结合蛋白
Comput Biol Chem. 2014 Oct;52:51-9. doi: 10.1016/j.compbiolchem.2014.09.002. Epub 2014 Sep 15.
10
nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification.nDNA-Prot:基于不平衡分类的 DNA 结合蛋白识别。
BMC Bioinformatics. 2014 Sep 8;15(1):298. doi: 10.1186/1471-2105-15-298.