Bhaskar Anand, Keich Uri
Computer Science Division, University of California, Berkeley, CA, USA.
Stat Appl Genet Mol Biol. 2010;9:Article28. doi: 10.2202/1544-6115.1544. Epub 2010 Jul 2.
We present a method for estimating and providing a confidence interval for the number of DNA replication origins in the genome of the yeast Kluyveromyces lactis. The method requires an initial set of verified sites from which a position specific frequency matrix (PSFM) can be constructed. We further assume that we have access to a sparingly used experimental procedure which can verify the functionality of a few, but not all, computationally predicted sites. While our motivation comes from estimating the number of autonomously replicating sequences (ARSs), our method can also be applied to estimating the genome-wide number of "functional" transcription factor binding sites, where functionality is determined by experimental verification of the transcription factor binding event using, for example, ChIP data. The reliability of our method is demonstrated by correctly predicting the known number of Saccharomyces cerevisiae ARSs as well as the number of S. cerevisiae probes that bind to the transcription factor ABF1.
我们提出了一种方法,用于估计乳酸克鲁维酵母基因组中DNA复制起点的数量并给出其置信区间。该方法需要一组初始的已验证位点,据此构建位置特异性频率矩阵(PSFM)。我们进一步假设可以使用一种使用频率较低的实验程序,该程序能够验证少数(而非全部)通过计算预测的位点的功能。虽然我们的动机来自于估计自主复制序列(ARS)的数量,但我们的方法也可用于估计全基因组范围内“功能性”转录因子结合位点的数量,其中功能性是通过使用例如ChIP数据对转录因子结合事件进行实验验证来确定的。我们的方法通过正确预测酿酒酵母ARS的已知数量以及与转录因子ABF1结合的酿酒酵母探针数量,证明了其可靠性。