Fogolari F, Tessari S, Molinari H
Dipartimento Scientifico Tecnologico, Facoltà di Scienze, Università di Verona, Verona, Italy.
Proteins. 2002 Feb 1;46(2):161-70. doi: 10.1002/prot.10032.
One of the standard tools for the analysis of data arranged in matrix form is singular value decomposition (SVD). Few applications to genomic data have been reported to date mainly for the analysis of gene expression microarray data. We review SVD properties, examine mathematical terms and assumptions implicit in the SVD formalism, and show that SVD can be applied to the analysis of matrices representing pairwise alignment scores between large sets of protein sequences. In particular, we illustrate SVD capabilities for data dimension reduction and for clustering protein sequences. A comparison is performed between SVD-generated clusters of proteins and annotation reported in the SWISS-PROT Database for a set of protein sequences forming the calycin superfamily, entailing all entries corresponding to the lipocalin, cytosolic fatty acid-binding protein, and avidin-streptavidin Prosite patterns.
用于分析以矩阵形式排列的数据的标准工具之一是奇异值分解(SVD)。迄今为止,很少有将其应用于基因组数据的报道,主要是用于基因表达微阵列数据的分析。我们回顾了SVD的性质,研究了SVD形式体系中隐含的数学术语和假设,并表明SVD可应用于分析表示大量蛋白质序列之间成对比对分数的矩阵。特别是,我们展示了SVD在数据降维和蛋白质序列聚类方面的能力。对一组构成钙结合蛋白超家族的蛋白质序列,在SVD生成的蛋白质簇与SWISS-PROT数据库中报告的注释之间进行了比较,该超家族包含与脂质运载蛋白、胞质脂肪酸结合蛋白以及抗生物素蛋白-链霉抗生物素蛋白Prosite模式相对应的所有条目。