• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质序列比对得分数据的奇异值分解分析

Singular value decomposition analysis of protein sequence alignment score data.

作者信息

Fogolari F, Tessari S, Molinari H

机构信息

Dipartimento Scientifico Tecnologico, Facoltà di Scienze, Università di Verona, Verona, Italy.

出版信息

Proteins. 2002 Feb 1;46(2):161-70. doi: 10.1002/prot.10032.

DOI:10.1002/prot.10032
PMID:11807944
Abstract

One of the standard tools for the analysis of data arranged in matrix form is singular value decomposition (SVD). Few applications to genomic data have been reported to date mainly for the analysis of gene expression microarray data. We review SVD properties, examine mathematical terms and assumptions implicit in the SVD formalism, and show that SVD can be applied to the analysis of matrices representing pairwise alignment scores between large sets of protein sequences. In particular, we illustrate SVD capabilities for data dimension reduction and for clustering protein sequences. A comparison is performed between SVD-generated clusters of proteins and annotation reported in the SWISS-PROT Database for a set of protein sequences forming the calycin superfamily, entailing all entries corresponding to the lipocalin, cytosolic fatty acid-binding protein, and avidin-streptavidin Prosite patterns.

摘要

用于分析以矩阵形式排列的数据的标准工具之一是奇异值分解(SVD)。迄今为止,很少有将其应用于基因组数据的报道,主要是用于基因表达微阵列数据的分析。我们回顾了SVD的性质,研究了SVD形式体系中隐含的数学术语和假设,并表明SVD可应用于分析表示大量蛋白质序列之间成对比对分数的矩阵。特别是,我们展示了SVD在数据降维和蛋白质序列聚类方面的能力。对一组构成钙结合蛋白超家族的蛋白质序列,在SVD生成的蛋白质簇与SWISS-PROT数据库中报告的注释之间进行了比较,该超家族包含与脂质运载蛋白、胞质脂肪酸结合蛋白以及抗生物素蛋白-链霉抗生物素蛋白Prosite模式相对应的所有条目。

相似文献

1
Singular value decomposition analysis of protein sequence alignment score data.蛋白质序列比对得分数据的奇异值分解分析
Proteins. 2002 Feb 1;46(2):161-70. doi: 10.1002/prot.10032.
2
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
3
Singular value decomposition of protein sequences as a method to visualize sequence and residue space.蛋白质序列的奇异值分解作为一种可视化序列和残基空间的方法。
Protein Sci. 2022 Oct;31(10):e4422. doi: 10.1002/pro.4422.
4
Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics.蛋白质序列大规模自动成对比对的基础:Z值统计的理论意义
Bioinformatics. 2004 Mar 1;20(4):534-7. doi: 10.1093/bioinformatics/btg440. Epub 2004 Jan 22.
5
Protein sequence comparison based on K-string dictionary.基于 K-字符串字典的蛋白质序列比较。
Gene. 2013 Oct 25;529(2):250-6. doi: 10.1016/j.gene.2013.07.092. Epub 2013 Aug 9.
6
Graph-based clustering for finding distant relationships in a large set of protein sequences.基于图形的聚类方法,用于在大量蛋白质序列中寻找远亲关系。
Bioinformatics. 2004 Jan 22;20(2):243-52. doi: 10.1093/bioinformatics/btg397.
7
On the significance of sequence alignments when using multiple scoring matrices.关于使用多个评分矩阵时序列比对的重要性。
Bioinformatics. 2004 Apr 12;20(6):881-7. doi: 10.1093/bioinformatics/btg498. Epub 2004 Jan 29.
8
Optimizing substitution matrices by separating score distributions.通过分离分数分布来优化替换矩阵。
Bioinformatics. 2004 Apr 12;20(6):863-73. doi: 10.1093/bioinformatics/btg494. Epub 2004 Jan 29.
9
Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment.蛋白质局部序列比对中有效空位开放罚分的成对统计显著性和经验确定
Int J Comput Biol Drug Des. 2008;1(4):347-67. doi: 10.1504/ijcbdd.2008.022207.
10
Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids.标准化特征向量:一种新颖的基于相邻氨基酸数量的无比对序列比较方法。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):457-67. doi: 10.1109/TCBB.2013.10.

引用本文的文献

1
A singular value decomposition approach for improved taxonomic classification of biological sequences.奇异值分解方法提高生物序列的分类学分类。
BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S11. doi: 10.1186/1471-2164-12-S4-S11.
2
Amino acid "little Big Bang": representing amino acid substitution matrices as dot products of Euclidian vectors.氨基酸“小大爆炸”:将氨基酸替换矩阵表示为欧几里得向量的点积。
BMC Bioinformatics. 2010 Jan 4;11:4. doi: 10.1186/1471-2105-11-4.
3
Subfamily specific conservation profiles for proteins based on n-gram patterns.
基于n元语法模式的蛋白质亚家族特异性保守概况。
BMC Bioinformatics. 2008 Jan 30;9:72. doi: 10.1186/1471-2105-9-72.
4
Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics.基于无标记定量蛋白质组学的人类蛋白质相互作用网络的概率组装
Proc Natl Acad Sci U S A. 2008 Feb 5;105(5):1454-9. doi: 10.1073/pnas.0706983105. Epub 2008 Jan 24.