College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China.
College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China.
Comput Biol Chem. 2019 Jun;80:10-15. doi: 10.1016/j.compbiolchem.2019.01.005. Epub 2019 Jan 18.
Sequence comparison is an important topic in bioinformatics. With the exponential increase of biological sequences, the traditional protein sequence comparison methods - the alignment methods become limited, so the alignment-free methods are widely proposed in the past two decades. In this paper, we considered not only the six typical physicochemical properties of amino acids, but also their frequency and positional distribution. A 51-dimensional vector was obtained to describe the protein sequence. We got a pairwise distance matrix by computing the standardized Euclidean distance, and discriminant analysis and phylogenetic analysis can be made. The results on the Influenza A virus and ND5 datasets indicate that our method is accurate and efficient for classifying proteins and inferring the phylogeny of species.
序列比对是生物信息学中的一个重要课题。随着生物序列的指数级增长,传统的蛋白质序列比对方法——即对齐方法变得有限,因此在过去二十年中广泛提出了无对齐方法。在本文中,我们不仅考虑了氨基酸的六种典型理化性质,还考虑了它们的频率和位置分布。通过计算标准化欧几里得距离得到一个 51 维向量来描述蛋白质序列。我们得到了一个成对距离矩阵,并可以进行判别分析和系统发育分析。在流感病毒 A 和 ND5 数据集上的结果表明,我们的方法在蛋白质分类和推断物种系统发育方面是准确和高效的。