• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 K-字符串字典的蛋白质序列比较。

Protein sequence comparison based on K-string dictionary.

机构信息

Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, IL 60607-7045,USA.

出版信息

Gene. 2013 Oct 25;529(2):250-6. doi: 10.1016/j.gene.2013.07.092. Epub 2013 Aug 9.

DOI:10.1016/j.gene.2013.07.092
PMID:23939466
Abstract

The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees.

摘要

当前基于 K-串的蛋白质序列比较需要大量的计算机内存,因为蛋白质向量表示的维度随 K 呈指数增长。在本文中,我们提出了一个新的概念,即“K-串字典”,以解决这个高维问题。它允许我们使用低得多的维度基于 K-串的频率或概率向量来表示蛋白质,从而大大减少了实现它们所需的计算机内存。此外,基于这个新概念,我们使用奇异值分解来分析真实的蛋白质数据集,改进后的蛋白质向量表示使我们能够获得准确的基因树。

相似文献

1
Protein sequence comparison based on K-string dictionary.基于 K-字符串字典的蛋白质序列比较。
Gene. 2013 Oct 25;529(2):250-6. doi: 10.1016/j.gene.2013.07.092. Epub 2013 Aug 9.
2
A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.同源蛋白质的一种构象空间,其保留互信息并允许基于成对Z分数概率进行系统发育推断。
BMC Bioinformatics. 2005 Mar 10;6:49. doi: 10.1186/1471-2105-6-49.
3
Comparison study on k-word statistical measures for protein: from sequence to 'sequence space'.蛋白质的k字统计量比较研究:从序列到“序列空间”
BMC Bioinformatics. 2008 Sep 23;9:394. doi: 10.1186/1471-2105-9-394.
4
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
5
Kalign--an accurate and fast multiple sequence alignment algorithm.Kalign——一种准确且快速的多序列比对算法。
BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298.
6
Singular value decomposition analysis of protein sequence alignment score data.蛋白质序列比对得分数据的奇异值分解分析
Proteins. 2002 Feb 1;46(2):161-70. doi: 10.1002/prot.10032.
7
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.概率多类多核学习:用于蛋白质折叠识别和远程同源性检测
Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.
8
A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes.一种利用全基因组蛋白质序列的向量表示构建的全面脊椎动物系统发育树。
Mol Biol Evol. 2002 Apr;19(4):554-62. doi: 10.1093/oxfordjournals.molbev.a004111.
9
Enhanced statistics for local alignment of multiple alignments improves prediction of protein function and structure.用于多重比对局部比对的增强统计方法可改善蛋白质功能和结构的预测。
Bioinformatics. 2005 Jul 1;21(13):2950-6. doi: 10.1093/bioinformatics/bti462. Epub 2005 May 3.
10
Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies.基于结构的同源蛋白渐进比对:采用序列比对策略。
Biochimie. 2012 Sep;94(9):2025-34. doi: 10.1016/j.biochi.2012.05.028. Epub 2012 Jun 4.

引用本文的文献

1
Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties.基于物理化学性质的蛋白质序列比较的数学方法。
ACS Omega. 2022 Oct 17;7(43):39446-39455. doi: 10.1021/acsomega.2c06103. eCollection 2022 Nov 1.
2
Analysis of the Genomic Distance Between Bat Coronavirus RaTG13 and SARS-CoV-2 Reveals Multiple Origins of COVID-19.蝙蝠冠状病毒RaTG13与严重急性呼吸综合征冠状病毒2(SARS-CoV-2)之间的基因组距离分析揭示了2019冠状病毒病(COVID-19)的多个起源。
Acta Math Sci. 2021;41(3):1017-1022. doi: 10.1007/s10473-021-0323-x. Epub 2021 Apr 19.
3
A protein structural study based on the centrality analysis of protein sequence feature networks.
基于蛋白质序列特征网络中心性分析的蛋白质结构研究。
PLoS One. 2021 Mar 29;16(3):e0248861. doi: 10.1371/journal.pone.0248861. eCollection 2021.
4
A study on separation of the protein structural types in amino acid sequence feature spaces.氨基酸序列特征空间中蛋白质结构类型的分离研究。
PLoS One. 2019 Dec 23;14(12):e0226768. doi: 10.1371/journal.pone.0226768. eCollection 2019.
5
Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence.基于药物指纹信息和蛋白质序列的药物-靶标相互作用预测。
Molecules. 2019 Aug 19;24(16):2999. doi: 10.3390/molecules24162999.
6
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.基于累积傅里叶功率和相位谱的大规模基因组比较:中心矩和协方差向量
Comput Struct Biotechnol J. 2019 Jul 11;17:982-994. doi: 10.1016/j.csbj.2019.07.003. eCollection 2019.
7
DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information.基于 CGR 利用重塑多种信息对蛋白质序列进行特征提取
BMC Bioinformatics. 2019 Jun 20;20(1):351. doi: 10.1186/s12859-019-2943-x.
8
A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance.一种使用核苷酸间协方差对基因组序列进行聚类的新方法。
Front Genet. 2019 Apr 9;10:234. doi: 10.3389/fgene.2019.00234. eCollection 2019.
9
Alignment-free similarity analysis for protein sequences based on fuzzy integral.基于模糊积分的蛋白质序列无对齐相似性分析。
Sci Rep. 2019 Feb 26;9(1):2775. doi: 10.1038/s41598-019-39477-8.
10
Establishing the phylogeny of Prochlorococcus with a new alignment-free method.用一种新的无比对方法构建原绿球藻的系统发育树。
Ecol Evol. 2017 Nov 15;7(24):11057-11065. doi: 10.1002/ece3.3535. eCollection 2017 Dec.