Suppr超能文献

基于后缀树和L-词频的序列比较免比对方法。

Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

作者信息

Soares Inês, Goios Ana, Amorim António

机构信息

Faculdade de Ciências da Universidade do Porto, 4169 Porto, Portugal.

出版信息

ScientificWorldJournal. 2012;2012:450124. doi: 10.1100/2012/450124. Epub 2012 Sep 10.

Abstract

The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

摘要

绝大多数用于序列比较的方法都依赖于初始的序列比对步骤,这需要对进化历史做出一些假设,而且由于存在大量空位(插入/缺失),有时很难甚至无法执行。在这种情况下,一种替代的无比对方法将被证明是有价值的。我们的方法首先计算所有序列的广义后缀树,这在线性时间内即可完成。利用这棵树,可以快速计算出每个序列中所有预设长度L的可能单词(L-单词)的频率。基于每个序列的L-单词频率分布,然后计算成对的标准欧几里得距离,生成一个对称的遗传距离矩阵,该矩阵可用于生成邻接法树状图或多维标度图。我们通过确定单个最优单词长度并将后缀树结构与单词计数任务相结合,对用于序列比较的无比对单词计数方法进行了改进。因此,我们的方法是一个快速简单的应用,在应用于线粒体基因组时被证明是高效且强大的。该算法用Python语言实现,可在网上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/3444837/1272c4b298dd/TSWJ2012-450124.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验