Suppr超能文献

一种用于转录因子结合位点识别问题的序贯蒙特卡罗期望最大化方法。

A sequential Monte Carlo EM approach to the transcription factor binding site identification problem.

作者信息

Jackson Edmund S, Fitzgerald William J

机构信息

Signal Processing Laboratory, Department of Engineering, Cambridge University, UK.

出版信息

Bioinformatics. 2007 Jun 1;23(11):1313-20. doi: 10.1093/bioinformatics/btm054. Epub 2007 Mar 25.

Abstract

MOTIVATION

A significant and stubbornly intractable problem in genome sequence analysis has been the de novo identification of transcription factor binding sites in promoter regions. Although theoretically pleasing, probabilistic methods have faced difficulties due to model mismatch and the nature of the biological sequence. These problems result in inference in a high dimensional, highly multimodal space, and consequently often display only local convergence and hence unsatisfactory performance.

ALGORITHM

In this article, we derive and demonstrate a novel method utilizing a sequential Monte Carlo-based expectation-maximization (EM) optimization to improve performance in this scenario. The Monte Carlo element should increase the robustness of the algorithm compared to classical EM. Furthermore, the parallel nature of the sequential Monte Carlo algorithm should be more robust than Gibbs sampling approaches to multimodality problems.

RESULTS

We demonstrate the superior performance of this algorithm on both semi-synthetic and real data from Escherichia coli.

AVAILABILITY

http://sigproc-eng.cam.ac.uk/ approximately ej230/smc_em_tfbsid.tar.gz

摘要

动机

基因组序列分析中一个重大且顽固棘手的问题一直是在启动子区域从头鉴定转录因子结合位点。尽管从理论上讲很有吸引力,但概率方法由于模型不匹配和生物序列的性质而面临困难。这些问题导致在高维、高度多模态空间中进行推断,因此常常仅表现出局部收敛,从而性能不尽人意。

算法

在本文中,我们推导并展示了一种新颖的方法,该方法利用基于序贯蒙特卡罗的期望最大化(EM)优化来提升此场景下的性能。与经典EM相比,蒙特卡罗元素应能提高算法的稳健性。此外,序贯蒙特卡罗算法的并行特性对于多模态问题应比吉布斯采样方法更稳健。

结果

我们在来自大肠杆菌的半合成数据和真实数据上都证明了该算法的卓越性能。

可用性

http://sigproc-eng.cam.ac.uk/ approximately ej230/smc_em_tfbsid.tar.gz

相似文献

1
A sequential Monte Carlo EM approach to the transcription factor binding site identification problem.
Bioinformatics. 2007 Jun 1;23(11):1313-20. doi: 10.1093/bioinformatics/btm054. Epub 2007 Mar 25.
2
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery.
Bioinformatics. 2008 Jan 1;24(1):46-55. doi: 10.1093/bioinformatics/btm543. Epub 2007 Nov 17.
3
A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length.
Bioinformatics. 2005 May 15;21(10):2240-5. doi: 10.1093/bioinformatics/bti336. Epub 2005 Feb 22.
4
SPACER: identification of cis-regulatory elements with non-contiguous critical residues.
Bioinformatics. 2007 Apr 15;23(8):1029-31. doi: 10.1093/bioinformatics/btm041.
5
Finding motifs from all sequences with and without binding sites.
Bioinformatics. 2006 Sep 15;22(18):2217-23. doi: 10.1093/bioinformatics/btl371. Epub 2006 Jul 26.
6
Informative priors based on transcription factor structural class improve de novo motif discovery.
Bioinformatics. 2006 Jul 15;22(14):e384-92. doi: 10.1093/bioinformatics/btl251.
7
MotifCut: regulatory motifs finding with maximum density subgraphs.
Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.
8
TFBS identification based on genetic algorithm with combined representations and adaptive post-processing.
Bioinformatics. 2008 Feb 1;24(3):341-9. doi: 10.1093/bioinformatics/btm606. Epub 2007 Dec 6.
9
MUSA: a parameter free algorithm for the identification of biologically significant motifs.
Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.
10
Context-specific independence mixture modeling for positional weight matrices.
Bioinformatics. 2006 Jul 15;22(14):e166-73. doi: 10.1093/bioinformatics/btl249.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验