Junier Ivan, Hérisson Joan, Képès François
Epigenomics Project, Genopole, CNRS UPS3201, UniverSud Paris, University of Evry, Genopole Campus 1 - Genavenir 6, 5 rue Henri Desbruères - F-91030 EVRY cedex, France.
Algorithms Mol Biol. 2010 Sep 10;5:31. doi: 10.1186/1748-7188-5-31.
The specific position of functionally related genes along the DNA has been shown to reflect the interplay between chromosome structure and genetic regulation. By investigating the statistical properties of the distances separating such genes, several studies have highlighted various periodic trends. In many cases, however, groups built up from co-functional or co-regulated genes are small and contain wrong information (data contamination) so that the statistics is poorly exploitable. In addition, gene positions are not expected to satisfy a perfectly ordered pattern along the DNA. Within this scope, we present an algorithm that aims to highlight periodic patterns in sparse boolean sequences, i.e. sequences of the type 010011011010... where the ratio of the number of 1's (denoting here the transcription start of a gene) to 0's is small.
The algorithm is particularly robust with respect to strong signal distortions such as the addition of 1's at arbitrary positions (contaminated data), the deletion of existing 1's in the sequence (missing data) and the presence of disorder in the position of the 1's (noise). This robustness property stems from an appropriate exploitation of the remarkable alignment properties of periodic points in solenoidal coordinates.
The efficiency of the algorithm is demonstrated in situations where standard Fourier-based spectral methods are poorly adapted. We also show how the proposed framework allows to identify the 1's that participate in the periodic trends, i.e. how the framework allows to allocate a positional score to genes, in the same spirit of the sequence score. The software is available for public use at http://www.issb.genopole.fr/MEGA/Softwares/iSSB_SolenoidalApplication.zip.
功能相关基因在DNA上的特定位置已被证明反映了染色体结构与基因调控之间的相互作用。通过研究分隔此类基因的距离的统计特性,多项研究突出了各种周期性趋势。然而,在许多情况下,由共功能或共调控基因组成的组规模较小且包含错误信息(数据污染),因此统计数据难以有效利用。此外,基因位置预计不会沿着DNA满足完美的有序模式。在此范围内,我们提出了一种算法,旨在突出稀疏布尔序列中的周期性模式,即010011011010... 这种类型的序列,其中1(在此表示基因的转录起始)的数量与0的数量之比很小。
该算法对于强烈的信号失真具有特别强的鲁棒性,例如在任意位置添加1(污染数据)、删除序列中现有的1(缺失数据)以及1的位置存在无序(噪声)。这种鲁棒性源于对螺线管坐标中周期点显著对齐特性的适当利用。
在基于傅里叶的标准谱方法不太适用的情况下,证明了该算法的效率。我们还展示了所提出的框架如何能够识别参与周期性趋势的1,即该框架如何能够以与序列得分相同的精神为基因分配位置得分。该软件可在http://www.issb.genopole.fr/MEGA/Softwares/iSSB_SolenoidalApplication.zip上供公众使用。