Suppr超能文献

一种用于聚类蛋白质序列的新型无比对向量方法。

A novel alignment-free vector method to cluster protein sequences.

作者信息

He Lily, Li Yongkun, He Rong Lucy, Yau Stephen S-T

机构信息

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, PR China.

Department of Biological Sciences, Chicago State University, Chicago, IL, USA.

出版信息

J Theor Biol. 2017 Aug 1;427:41-52. doi: 10.1016/j.jtbi.2017.06.002. Epub 2017 Jun 3.

Abstract

Classification of protein are crucial topics in biology. The number of protein sequences stored in databases increases sharply in the past decade. Traditionally, comparison of protein sequences is usually carried out through multiple sequence alignment methods. However, these methods may be unsuitable for clustering of protein sequences when gene rearrangements occur such as in viral genomes. The computation is also very time-consuming for large datasets with long genomes. In this paper, based on three important biochemical properties of amino acids: the hydropathy index, polar requirement and chemical composition of the side chain, we propose a 24 dimensional feature vector describing the composition of amino acids in protein sequences. Our method not only utilizes the chemical properties of amino acids but also counts on their numbers and positions. The results on beta-globin, mammals, and three virus datasets show that this new tool is fast and accurate for classifying proteins and inferring the phylogeny of organisms.

摘要

蛋白质分类是生物学中的关键课题。在过去十年中,数据库中存储的蛋白质序列数量急剧增加。传统上,蛋白质序列的比较通常通过多序列比对方法进行。然而,当基因重排发生时,如在病毒基因组中,这些方法可能不适用于蛋白质序列的聚类。对于具有长基因组的大型数据集,计算也非常耗时。在本文中,基于氨基酸的三个重要生化特性:亲水性指数、极性需求和侧链的化学组成,我们提出了一个24维特征向量来描述蛋白质序列中氨基酸的组成。我们的方法不仅利用了氨基酸的化学性质,还考虑了它们的数量和位置。在β-珠蛋白、哺乳动物和三个病毒数据集上的结果表明,这个新工具在蛋白质分类和推断生物系统发育方面快速且准确。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验