Suppr超能文献

一种用于相似性分析和蛋白质亚细胞定位预测的新方法。

A novel method for similarity analysis and protein sub-cellular localization prediction.

机构信息

School of computer and communication, Hunan University, Changsha, Hunan, China.

出版信息

Bioinformatics. 2010 Nov 1;26(21):2678-83. doi: 10.1093/bioinformatics/btq521. Epub 2010 Sep 8.

Abstract

MOTIVATION

Biological sequence was regarded as an important study by many biologists, because the sequence contains a large number of biological information, what is helpful for scientists' studies on biological cells, DNA and proteins. Currently, many researchers used the method based on protein sequences in function classification, sub-cellular location, structure and functional site prediction, including some machine-learning methods. The purpose of this article, is to find a new way of sequence analysis, but more simple and effective.

RESULTS

According to the nature of 64 genetic codes, we propose a simple and intuitive 2D graphical expression of protein sequences. And based on this expression we give a new Euclidean-distance method to compute the distance of different sequences for the analysis of sequence similarity. This approach contains more sequence information. A typical phylogenetic tree constructed based on this method proved the effectiveness of our approach. Finally, we use this sequence-similarity-analysis method to predict protein sub-cellular localization, in the two datasets commonly used. The results show that the method is reasonable.

摘要

动机

生物序列被许多生物学家视为一项重要的研究,因为序列中包含大量的生物信息,这有助于科学家研究生物细胞、DNA 和蛋白质。目前,许多研究人员在功能分类、亚细胞定位、结构和功能位点预测中使用基于蛋白质序列的方法,包括一些机器学习方法。本文的目的是寻找一种新的序列分析方法,但更简单、更有效。

结果

根据 64 种遗传密码的性质,我们提出了一种简单直观的蛋白质序列 2D 图形表示法。并基于此表示,我们给出了一种新的欧几里得距离方法来计算不同序列之间的距离,以便分析序列的相似性。这种方法包含了更多的序列信息。基于该方法构建的典型系统发育树证明了我们方法的有效性。最后,我们使用这种序列相似性分析方法来预测蛋白质的亚细胞定位,在两个常用的数据集上进行了实验。结果表明,该方法是合理的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验