School of Agriculture and hydraulic Engineering, Suihua University, Suihua 152061, China.
School of Information Engineering, Suihua University, Suihua 152061, China.
Genomics. 2019 Dec;111(6):1298-1305. doi: 10.1016/j.ygeno.2018.08.010. Epub 2018 Sep 5.
Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.
基于蛋白质序列的 k-mer 模型,提出了一种新的 k-mer 自然向量方法,用于描述蛋白质序列中 k-mer 的特征,其中考虑了 k-mer 的数量和分布。证明了蛋白质序列与其 k-mer 自然向量之间的关系是一一对应的。因此,无需进化模型或人为干预即可轻松进行蛋白质序列的系统发育分析。此外,没有标准来选择合适的 k,并且 k 对结果和计算复杂度有很大的影响。在本文中,利用复合 k-mer 自然向量来量化每个蛋白质序列。对三个蛋白质数据集进行的系统发育分析的结果表明,我们的新方法可以精确地描述蛋白质的进化关系,并大大提高计算效率。