Suppr超能文献

一种用于具有上下文敏感插入缺失的序列比对的概率模型。

A probabilistic model for sequence alignment with context-sensitive indels.

作者信息

Hickey Glenn, Blanchette Mathieu

机构信息

Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.

出版信息

J Comput Biol. 2011 Nov;18(11):1449-64. doi: 10.1089/cmb.2011.0157. Epub 2011 Sep 27.

Abstract

Probabilistic approaches for sequence alignment are usually based on pair Hidden Markov Models (HMMs) or Stochastic Context Free Grammars (SCFGs). Recent studies have shown a significant correlation between the content of short indels and their flanking regions, which by definition cannot be modelled by the above two approaches. In this work, we present a context-sensitive indel model based on a pair Tree-Adjoining Grammar (TAG), along with accompanying algorithms for efficient alignment and parameter estimation. The increased precision and statistical power of this model is shown on simulated and real genomic data. As the cost of sequencing plummets, the usefulness of comparative analysis is becoming limited by alignment accuracy rather than data availability. Our results will therefore have an impact on any type of downstream comparative genomics analyses that rely on alignments. Fine-grained studies of small functional regions or disease markers, for example, could be significantly improved by our method. The implementation is available at www.mcb.mcgill.ca/~blanchem/software.html.

摘要

序列比对的概率方法通常基于成对隐马尔可夫模型(HMM)或随机上下文无关文法(SCFG)。最近的研究表明,短插入缺失的内容与其侧翼区域之间存在显著相关性,根据定义,上述两种方法无法对其进行建模。在这项工作中,我们提出了一种基于成对树邻接文法(TAG)的上下文敏感插入缺失模型,以及用于高效比对和参数估计的配套算法。该模型在模拟和真实基因组数据上显示出更高的精度和统计能力。随着测序成本的大幅下降,比较分析的实用性正受到比对准确性而非数据可用性的限制。因此,我们的结果将对任何依赖比对的下游比较基因组学分析产生影响。例如,对小功能区域或疾病标记的细粒度研究可以通过我们的方法得到显著改善。该实现可在www.mcb.mcgill.ca/~blanchem/software.html获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验