一种用于相似性分析和蛋白质亚细胞定位预测的新方法。

A novel method for similarity analysis and protein sub-cellular localization prediction.

机构信息

School of computer and communication, Hunan University, Changsha, Hunan, China.

出版信息

Bioinformatics. 2010 Nov 1;26(21):2678-83. doi: 10.1093/bioinformatics/btq521. Epub 2010 Sep 8.

DOI:10.1093/bioinformatics/btq521

PMID:20826879

Abstract

MOTIVATION

Biological sequence was regarded as an important study by many biologists, because the sequence contains a large number of biological information, what is helpful for scientists' studies on biological cells, DNA and proteins. Currently, many researchers used the method based on protein sequences in function classification, sub-cellular location, structure and functional site prediction, including some machine-learning methods. The purpose of this article, is to find a new way of sequence analysis, but more simple and effective.

RESULTS

According to the nature of 64 genetic codes, we propose a simple and intuitive 2D graphical expression of protein sequences. And based on this expression we give a new Euclidean-distance method to compute the distance of different sequences for the analysis of sequence similarity. This approach contains more sequence information. A typical phylogenetic tree constructed based on this method proved the effectiveness of our approach. Finally, we use this sequence-similarity-analysis method to predict protein sub-cellular localization, in the two datasets commonly used. The results show that the method is reasonable.

摘要

动机

生物序列被许多生物学家视为一项重要的研究，因为序列中包含大量的生物信息，这有助于科学家研究生物细胞、DNA 和蛋白质。目前，许多研究人员在功能分类、亚细胞定位、结构和功能位点预测中使用基于蛋白质序列的方法，包括一些机器学习方法。本文的目的是寻找一种新的序列分析方法，但更简单、更有效。

结果

根据 64 种遗传密码的性质，我们提出了一种简单直观的蛋白质序列 2D 图形表示法。并基于此表示，我们给出了一种新的欧几里得距离方法来计算不同序列之间的距离，以便分析序列的相似性。这种方法包含了更多的序列信息。基于该方法构建的典型系统发育树证明了我们方法的有效性。最后，我们使用这种序列相似性分析方法来预测蛋白质的亚细胞定位，在两个常用的数据集上进行了实验。结果表明，该方法是合理的。

相似文献

A novel method for similarity analysis and protein sub-cellular localization prediction.

Bioinformatics. 2010 Nov 1;26(21):2678-83. doi: 10.1093/bioinformatics/btq521. Epub 2010 Sep 8.

On the quality of tree-based protein classification.

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

A novel method to analyze the similarity of biological sequences.

J Biomol Struct Dyn. 2009 Apr;26(5):599-608. doi: 10.1080/07391102.2009.10507275.

A new approach to prediction of short-range conformational propensities in proteins.

Bioinformatics. 2005 Apr 1;21(7):981-7. doi: 10.1093/bioinformatics/bti080. Epub 2004 Oct 27.

Amino Acids. 2013 Feb;44(2):573-80. doi: 10.1007/s00726-012-1374-z. Epub 2012 Aug 1.

A neural network method for prediction of beta-turn types in proteins using evolutionary information.

Bioinformatics. 2004 Nov 1;20(16):2751-8. doi: 10.1093/bioinformatics/bth322. Epub 2004 May 14.

Blast sampling for structural and functional analyses.

BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.

HYPROSP II--a knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence.

Bioinformatics. 2005 Aug 1;21(15):3227-33. doi: 10.1093/bioinformatics/bti524. Epub 2005 Jun 2.

Protein structural similarity search by Ramachandran codes.

BMC Bioinformatics. 2007 Aug 23;8:307. doi: 10.1186/1471-2105-8-307.

Analysis and prediction of functional sub-types from protein sequence alignments.

J Mol Biol. 2000 Oct 13;303(1):61-76. doi: 10.1006/jmbi.2000.4036.

引用本文的文献

New distance measure for comparing protein using cellular automata image.

PLoS One. 2023 Oct 5;18(10):e0287880. doi: 10.1371/journal.pone.0287880. eCollection 2023.

Gene function prediction based on combining gene ontology hierarchy with multi-instance multi-label learning.

RSC Adv. 2018 Aug 10;8(50):28503-28509. doi: 10.1039/c8ra05122d. eCollection 2018 Aug 7.

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter.

Sci Rep. 2021 Jul 1;11(1):13701. doi: 10.1038/s41598-021-93154-3.

Inactivation of Interferon Regulatory Factor 1 Causes Susceptibility to Colitis-Associated Colorectal Cancer.

Sci Rep. 2019 Dec 11;9(1):18897. doi: 10.1038/s41598-019-55378-2.

Predicting Influenza Antigenicity by Matrix Completion With Antigen and Antiserum Similarity.

Front Microbiol. 2018 Oct 23;9:2500. doi: 10.3389/fmicb.2018.02500. eCollection 2018.

One novel representation of DNA sequence based on the global and local position information.

Sci Rep. 2018 May 15;8(1):7592. doi: 10.1038/s41598-018-26005-3.

Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation.

Comb Chem High Throughput Screen. 2018;21(2):100-110. doi: 10.2174/1386207321666180130100838.

Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix.

Sci Rep. 2017 Apr 10;7:46237. doi: 10.1038/srep46237.

An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

PLoS One. 2016 Dec 5;11(12):e0167430. doi: 10.1371/journal.pone.0167430. eCollection 2016.

ADLD: a novel graphical representation of protein sequences and its application.

Comput Math Methods Med. 2014;2014:959753. doi: 10.1155/2014/959753. Epub 2014 Oct 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于相似性分析和蛋白质亚细胞定位预测的新方法。

A novel method for similarity analysis and protein sub-cellular localization prediction.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献