Suppr超能文献

基于 K-字符串字典的蛋白质序列比较。

Protein sequence comparison based on K-string dictionary.

机构信息

Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, IL 60607-7045,USA.

出版信息

Gene. 2013 Oct 25;529(2):250-6. doi: 10.1016/j.gene.2013.07.092. Epub 2013 Aug 9.

Abstract

The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees.

摘要

当前基于 K-串的蛋白质序列比较需要大量的计算机内存,因为蛋白质向量表示的维度随 K 呈指数增长。在本文中,我们提出了一个新的概念,即“K-串字典”,以解决这个高维问题。它允许我们使用低得多的维度基于 K-串的频率或概率向量来表示蛋白质,从而大大减少了实现它们所需的计算机内存。此外,基于这个新概念,我们使用奇异值分解来分析真实的蛋白质数据集,改进后的蛋白质向量表示使我们能够获得准确的基因树。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验