• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

奇异值分解方法提高生物序列的分类学分类。

A singular value decomposition approach for improved taxonomic classification of biological sequences.

机构信息

Department of General Biology, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Av, Antônio Carlos, 6627, MG, 31,270-901, Brazil.

出版信息

BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S11. doi: 10.1186/1471-2164-12-S4-S11.

DOI:10.1186/1471-2164-12-S4-S11
PMID:22369633
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3287580/
Abstract

BACKGROUND

Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area.

RESULTS

We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification.

CONCLUSIONS

By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy.

摘要

背景

奇异值分解(Singular Value Decomposition,SVD)是一种强大的信息检索技术,它有助于揭示表面上没有关联的元素之间的关系。SVD 最初是为了减少在复杂的互联网环境中检索和分析大型数据集所需的时间而开发的。由于从大规模基因组和蛋白质组数据集进行信息检索具有类似的复杂性,因此基于 SVD 的方法也可以促进该研究领域的数据分析。

结果

我们发现,应用于氨基酸序列的 SVD 展示了关系,并为产生聚类和系统发育树提供了基础,证明了物种的进化关系与林奈分类学密切相关。选择合理数量的奇异值对于基于 SVD 的研究至关重要。我们发现,当使用 SVD 时,产生具有生物学意义的聚类所需的奇异值数量较少。随后,我们开发了一种确定最低数量的奇异值和聚类数以保证生物学意义的方法;该系统通过与林奈分类学分类的比较进行了开发和验证。

结论

通过使用 SVD,我们可以减少对执行准确信息检索分析所需的适当秩值的不确定性。在测试中,我们使用 SVD 开发的聚类与基于林奈分类学的预期完全匹配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/53aa3c33778f/1471-2164-12-S4-S11-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/56c28776a614/1471-2164-12-S4-S11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/641abe143b12/1471-2164-12-S4-S11-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/cbec20aaac7d/1471-2164-12-S4-S11-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/a8f427afd697/1471-2164-12-S4-S11-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/d1a89c399d70/1471-2164-12-S4-S11-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/4713363624ed/1471-2164-12-S4-S11-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/53aa3c33778f/1471-2164-12-S4-S11-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/56c28776a614/1471-2164-12-S4-S11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/641abe143b12/1471-2164-12-S4-S11-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/cbec20aaac7d/1471-2164-12-S4-S11-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/a8f427afd697/1471-2164-12-S4-S11-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/d1a89c399d70/1471-2164-12-S4-S11-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/4713363624ed/1471-2164-12-S4-S11-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcf6/3287580/53aa3c33778f/1471-2164-12-S4-S11-7.jpg

相似文献

1
A singular value decomposition approach for improved taxonomic classification of biological sequences.奇异值分解方法提高生物序列的分类学分类。
BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S11. doi: 10.1186/1471-2164-12-S4-S11.
2
Singular value decomposition of protein sequences as a method to visualize sequence and residue space.蛋白质序列的奇异值分解作为一种可视化序列和残基空间的方法。
Protein Sci. 2022 Oct;31(10):e4422. doi: 10.1002/pro.4422.
3
Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure.在聚类上使用奇异值分解以提高文档间相似性度量的精度。
Comput Intell Neurosci. 2016;2016:1096271. doi: 10.1155/2016/1096271. Epub 2016 Aug 7.
4
Estimating the number of hidden neurons in a feedforward network using the singular value decomposition.使用奇异值分解估计前馈网络中隐藏神经元的数量。
IEEE Trans Neural Netw. 2006 Nov;17(6):1623-9. doi: 10.1109/TNN.2006.880582.
5
Enhancement of in vivo cardiac photoacoustic signal specificity using spatiotemporal singular value decomposition.利用时空奇异值分解增强体内心脏光声信号特异性。
J Biomed Opt. 2021 Apr;26(4). doi: 10.1117/1.JBO.26.4.046001.
6
SVDMAN--singular value decomposition analysis of microarray data.SVDMAN——微阵列数据的奇异值分解分析
Bioinformatics. 2001 Jun;17(6):566-8. doi: 10.1093/bioinformatics/17.6.566.
7
Singular value decomposition analysis of protein sequence alignment score data.蛋白质序列比对得分数据的奇异值分解分析
Proteins. 2002 Feb 1;46(2):161-70. doi: 10.1002/prot.10032.
8
Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations.通过矩阵分解从生物医学文献中提取未识别的基因关系。
BMC Bioinformatics. 2007 Nov 27;8 Suppl 9(Suppl 9):S6. doi: 10.1186/1471-2105-8-S9-S6.
9
Accelerated Singular Value-Based Ultrasound Blood Flow Clutter Filtering With Randomized Singular Value Decomposition and Randomized Spatial Downsampling.基于加速奇异值分解和随机空间降采样的随机奇异值超声血流杂波滤波。
IEEE Trans Ultrason Ferroelectr Freq Control. 2017 Apr;64(4):706-716. doi: 10.1109/TUFFC.2017.2665342. Epub 2017 Feb 7.
10
SVD-based Tensor-Completion Technique for Background Initialization.基于奇异值分解的张量补全技术用于背景初始化
IEEE Trans Image Process. 2018 Jun;27(6):3114-3126. doi: 10.1109/TIP.2018.2817045. Epub 2018 Mar 19.

引用本文的文献

1
Progression of 'OMICS' methodologies for understanding the pathogenicity of Corynebacterium pseudotuberculosis: the Brazilian experience.用于理解伪结核棒状杆菌致病性的“组学”方法进展:巴西的经验
Comput Struct Biotechnol J. 2013 Oct 13;6:e201303013. doi: 10.5936/csbj.201303013. eCollection 2013.

本文引用的文献

1
Fuzzy kernel clustering of RNA secondary structure ensemble using a novel similarity metric.使用一种新型相似性度量对RNA二级结构集合进行模糊核聚类。
J Biomol Struct Dyn. 2008 Jun;25(6):685-96. doi: 10.1080/07391102.2008.10507214.
2
Subfamily specific conservation profiles for proteins based on n-gram patterns.基于n元语法模式的蛋白质亚家族特异性保守概况。
BMC Bioinformatics. 2008 Jan 30;9:72. doi: 10.1186/1471-2105-9-72.
3
Application of latent semantic indexing to evaluate the similarity of sets of sequences without multiple alignments character-by-character.
潜在语义索引在不逐字符进行多重比对的情况下评估序列集相似性中的应用。
Genet Mol Res. 2007 Oct 5;6(4):983-99.
4
A sequence alignment-independent method for protein classification.一种与序列比对无关的蛋白质分类方法。
Appl Bioinformatics. 2004;3(2-3):137-48. doi: 10.2165/00822942-200403020-00008.
5
Algebraic reconstruction for 3D magnetic resonance-electrical impedance tomography (MREIT) using one component of magnetic flux density.使用磁通密度的一个分量对三维磁共振电阻抗断层成像(MREIT)进行代数重建
Physiol Meas. 2004 Feb;25(1):281-94. doi: 10.1088/0967-3334/25/1/032.
6
Adaptive quality-based clustering of gene expression profiles.基于适应性质量的基因表达谱聚类
Bioinformatics. 2002 May;18(5):735-46. doi: 10.1093/bioinformatics/18.5.735.
7
A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes.一种利用全基因组蛋白质序列的向量表示构建的全面脊椎动物系统发育树。
Mol Biol Evol. 2002 Apr;19(4):554-62. doi: 10.1093/oxfordjournals.molbev.a004111.
8
Singular value decomposition analysis of protein sequence alignment score data.蛋白质序列比对得分数据的奇异值分解分析
Proteins. 2002 Feb 1;46(2):161-70. doi: 10.1002/prot.10032.