一种用于比较调控序列的无比对模型。

An alignment-free model for comparison of regulatory sequences.

机构信息

MOAC Doctoral Training Centre, Coventry House, University of Warwick, Coventry, CV4 7AL, UK.

出版信息

Bioinformatics. 2010 Oct 1;26(19):2391-7. doi: 10.1093/bioinformatics/btq453. Epub 2010 Aug 9.

DOI:10.1093/bioinformatics/btq453

PMID:20696736

Abstract

MOTIVATION

Some recent comparative studies have revealed that regulatory regions can retain function over large evolutionary distances, even though the DNA sequences are divergent and difficult to align. It is also known that such enhancers can drive very similar expression patterns. This poses a challenge for the in silico detection of biologically related sequences, as they can only be discovered using alignment-free methods.

RESULTS

Here, we present a new computational framework called Regulatory Region Scoring (RRS) model for the detection of functional conservation of regulatory sequences using predicted occupancy levels of transcription factors of interest. We demonstrate that our model can detect the functional and/or evolutionary links between some non-alignable enhancers with a strong statistical significance. We also identify groups of enhancers that are likely to be similarly regulated. Our model is motivated by previous work on prediction of expression patterns and it can capture similarity by strong binding sites, weak binding sites and even the statistically significant absence of sites. Our results support the hypothesis that weak binding sites contribute to the functional similarity of sequences. Our model fills a gap between two families of models: detailed, data-intensive models for the prediction of precise spatio-temporal expression patterns on the one side, and crude, generally applicable models on the other side. Our model borrows some of the strengths of each group and addresses their drawbacks.

AVAILABILITY

The RRS source code is freely available upon publication of this manuscript: http://www2.warwick.ac.uk/fac/sci/systemsbiology/staff/ott/tools_and_software/rrs.

摘要

动机

一些最近的比较研究表明，调节区域可以在很大的进化距离上保留功能，即使 DNA 序列是发散的，难以对齐。也已知这种增强子可以驱动非常相似的表达模式。这对生物相关序列的计算检测提出了挑战，因为它们只能使用无对齐方法来发现。

结果

在这里，我们提出了一个新的计算框架，称为调节区域评分（RRS）模型，用于检测使用感兴趣的转录因子的预测占据水平的调节序列的功能保守性。我们证明，我们的模型可以以很强的统计学意义检测到一些不可对齐的增强子之间的功能和/或进化联系。我们还确定了可能受到类似调节的增强子组。我们的模型是基于先前关于表达模式预测的工作，它可以通过强结合位点、弱结合位点甚至统计上显著的无结合位点来捕捉相似性。我们的结果支持这样的假设，即弱结合位点有助于序列的功能相似性。我们的模型填补了两类模型之间的空白：一类是用于预测精确时空表达模式的详细、数据密集型模型，另一类是粗糙、通用适用的模型。我们的模型借鉴了每一组的一些优势，并解决了它们的缺点。