School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, Shanghai 201804, China.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):457-67. doi: 10.1109/TCBB.2013.10.
Based on all kinds of adjacent amino acids (AAA), we map each protein primary sequence into a 400 by ((L-1)) matrix (M). In addition, we further derive a normalized 400-tuple mathematical descriptors (D), which is extracted from the primary protein sequences via singular values decomposition (SVD) of the matrix. The obtained 400-D normalized feature vectors (NFVs) further facilitate our quantitative analysis of protein sequences. Using the normalized representation of the primary protein sequences, we analyze the similarity for different sequences upon two data sets: 1) ND5 sequences from nine species and 2) transferrin sequences of 24 vertebrates. We also compared the results in this study with those from other related works. These two experiments illustrate that our proposed NFV-AAA approach does perform well in the field of similarity analysis of sequence.
基于各种相邻氨基酸 (AAA),我们将每个蛋白质的一级序列映射到一个 400 乘以 ((L-1)) 的矩阵 (M)。此外,我们进一步推导出一个归一化的 400 元组数学描述符 (D),它是通过矩阵的奇异值分解 (SVD) 从一级蛋白质序列中提取出来的。获得的 400-D 归一化特征向量 (NFV) 进一步促进了我们对蛋白质序列的定量分析。使用一级蛋白质序列的归一化表示,我们在两个数据集上分析不同序列之间的相似性:1) 来自九个物种的 ND5 序列和 2) 24 种脊椎动物的转铁蛋白序列。我们还将本研究的结果与其他相关工作的结果进行了比较。这两个实验表明,我们提出的 NFV-AAA 方法在序列相似性分析领域表现良好。