Department of General Biology, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Av, Antônio Carlos, 6627, MG, 31,270-901, Brazil.
BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S11. doi: 10.1186/1471-2164-12-S4-S11.
Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area.
We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification.
By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy.
奇异值分解(Singular Value Decomposition,SVD)是一种强大的信息检索技术,它有助于揭示表面上没有关联的元素之间的关系。SVD 最初是为了减少在复杂的互联网环境中检索和分析大型数据集所需的时间而开发的。由于从大规模基因组和蛋白质组数据集进行信息检索具有类似的复杂性,因此基于 SVD 的方法也可以促进该研究领域的数据分析。
我们发现,应用于氨基酸序列的 SVD 展示了关系,并为产生聚类和系统发育树提供了基础,证明了物种的进化关系与林奈分类学密切相关。选择合理数量的奇异值对于基于 SVD 的研究至关重要。我们发现,当使用 SVD 时,产生具有生物学意义的聚类所需的奇异值数量较少。随后,我们开发了一种确定最低数量的奇异值和聚类数以保证生物学意义的方法;该系统通过与林奈分类学分类的比较进行了开发和验证。
通过使用 SVD,我们可以减少对执行准确信息检索分析所需的适当秩值的不确定性。在测试中,我们使用 SVD 开发的聚类与基于林奈分类学的预期完全匹配。