费马斯特：一种用于蛋白质序列比较和DNA结合蛋白识别的新型数字表示法。

FermatS: A Novel Numerical Representation for Protein Sequence Comparison and DNA-binding Protein Identification.

作者信息

Zhang Yanping, Gao Ya, Ni Jianwei, Chen Pengcheng, Wang Xiaosheng

机构信息

School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan 056038, China.

出版信息

Comb Chem High Throughput Screen. 2021;24(10):1746-1753. doi: 10.2174/1386207323999201117111738.

DOI:10.2174/1386207323999201117111738

PMID:33208064

Abstract

AIMS

Based on protein sequence information, a simple and effective method was used to analyze protein sequence similarity and predict DNA-binding protein.

BACKGROUND

It is absolutely necessary that we generate computational methods of low complexity to accurate infer protein structure, function, and evolution in the rapidly growing number of molecular biology data available.

OBJECTIVE

It is important to generate novel computational algorithms for analyzing and comparing protein sequences with the rapidly growing number of molecular biology data available.

METHODS

Based on global and local position representation with the curves of Fermat spiral and normalized moments of inertia of the curve of Fermat spiral, respectively, moreover, composition of 20 amino acids to get the numerical characteristics of protein sequences.

RESULTS

It has been applied to analyze the similarity/dissimilarity of nine ND5 proteins, the analysis results are consistent with the biological evolution theory. Furthermore, we employ the Logistic regression with 5-fold cross-validation to establish the prediction of DNA-binding proteins model, which outperformed the DNAbinder, iDNA-prot, DNA-prot and gDNA-prot by 0.0069-0.609 in terms of F-measure, 0.293-0.898 in terms of MCC in unbalanced dataset.

CONCLUSION

These results show that our method, namely FermatS, is effective to compare, recognition and prediction the protein sequences.

摘要

目的

基于蛋白质序列信息，采用一种简单有效的方法分析蛋白质序列相似性并预测DNA结合蛋白。

背景

鉴于现有分子生物学数据数量迅速增长，生成低复杂度的计算方法以准确推断蛋白质结构、功能和进化是绝对必要的。

目的

鉴于现有分子生物学数据数量迅速增长，生成用于分析和比较蛋白质序列的新型计算算法很重要。

方法

分别基于费马螺旋曲线的全局和局部位置表示以及费马螺旋曲线的归一化惯性矩，此外，采用20种氨基酸的组成来获取蛋白质序列的数值特征。

结果

该方法已应用于分析9种ND5蛋白的相似性/差异性，分析结果与生物进化理论一致。此外，我们采用5折交叉验证的逻辑回归建立DNA结合蛋白预测模型，在不平衡数据集中，该模型在F值方面比DNAbinder、iDNA-prot、DNA-prot和gDNA-prot高出0.0069 - 0.609，在马修斯相关系数方面高出0.293 - 0.898。

结论

这些结果表明我们的方法，即FermatS，在比较、识别和预测蛋白质序列方面是有效的。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

费马斯特：一种用于蛋白质序列比较和DNA结合蛋白识别的新型数字表示法。

FermatS: A Novel Numerical Representation for Protein Sequence Comparison and DNA-binding Protein Identification.

作者信息

机构信息

出版信息

AIMS

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

背景

目的

方法

结果

结论

相似文献

费马斯特：一种用于蛋白质序列比较和DNA结合蛋白识别的新型数字表示法。

FermatS: A Novel Numerical Representation for Protein Sequence Comparison and DNA-binding Protein Identification.

作者信息

机构信息

出版信息

AIMS

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

背景

目的

方法

结果

结论

相似文献