• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LOGOS:一种用于从头基序检测的模块化贝叶斯模型。

LOGOS: a modular Bayesian model for de novo motif detection.

作者信息

Xing Eric P, Wu Wei, Jordan Michael I, Karp Richard M

机构信息

Computer Science Division, University of California, Berkeley, 94720, USA.

出版信息

Proc IEEE Comput Soc Bioinform Conf. 2003;2:266-76.

PMID:16452802
Abstract

The complexity of the global organization and internal structures of motifs in higher eukaryotic organisms raises significant challenges for motif detection techniques. To achieve successful de novo motif detection it is necessary to model the complex dependencies within and among motifs and incorporate biological prior knowledge. In this paper, we present LOGOS, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences, which provides a principled framework for developing, modularizing, extending and computing expressive motif models for complex biopolymer sequence analysis. LOGOS consists of two interacting submodels: HMDM, a local alignment model capturing biological prior knowledge and positional dependence within the motif local structure; and HMM, a global motif distribution model modeling frequencies and dependencies of motif occurrences. Model parameters can be fit using training motifs within an empirical Bayesian framework. A variational EM algorithm is developed for de novo motif detection. LOGOS improves over existing models that ignore biological priors and dependencies in motif structures and motif occurrences, and demonstrates superior performance on both semi-realistic test data and cis-regulatory sequences from yeast and Drosophila sequences with regard to sensitivity, specificity, flexibility and extensibility.

摘要

高等真核生物中基序的全球组织和内部结构的复杂性给基序检测技术带来了重大挑战。为了成功地进行从头基序检测,有必要对基序内部和之间的复杂依赖性进行建模,并纳入生物学先验知识。在本文中,我们提出了LOGOS,一种用于生物聚合物序列的局部和全局基序序列集成模型,它为开发、模块化、扩展和计算用于复杂生物聚合物序列分析的表达性基序模型提供了一个有原则的框架。LOGOS由两个相互作用的子模型组成:HMDM,一个捕捉生物学先验知识和基序局部结构内位置依赖性的局部比对模型;以及HMM,一个对基序出现的频率和依赖性进行建模的全局基序分布模型。模型参数可以在经验贝叶斯框架内使用训练基序进行拟合。我们开发了一种变分期望最大化算法用于从头基序检测。LOGOS优于现有的忽略生物学先验以及基序结构和基序出现中的依赖性的模型,并且在半现实测试数据以及来自酵母和果蝇序列的顺式调控序列上,在敏感性、特异性、灵活性和可扩展性方面都表现出卓越的性能。

相似文献

1
LOGOS: a modular Bayesian model for de novo motif detection.LOGOS:一种用于从头基序检测的模块化贝叶斯模型。
Proc IEEE Comput Soc Bioinform Conf. 2003;2:266-76.
2
Logos: a modular bayesian model for de novo motif detection.Logos:一种用于从头基序检测的模块化贝叶斯模型。
J Bioinform Comput Biol. 2004 Mar;2(1):127-54. doi: 10.1142/s0219720004000508.
3
Generalized hierarchical markov models for the discovery of length-constrained sequence features from genome tiling arrays.用于从基因组平铺阵列中发现长度受限序列特征的广义分层马尔可夫模型。
Biometrics. 2007 Sep;63(3):797-805. doi: 10.1111/j.1541-0420.2007.00760.x.
4
BayesMD: flexible biological modeling for motif discovery.贝叶斯医学博士:用于基序发现的灵活生物学建模。
J Comput Biol. 2008 Dec;15(10):1347-63. doi: 10.1089/cmb.2007.0176.
5
A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length.一种用于识别具有对称结构、间隔的DNA基序并改进信号长度估计的吉布斯采样器。
Bioinformatics. 2005 May 15;21(10):2240-5. doi: 10.1093/bioinformatics/bti336. Epub 2005 Feb 22.
6
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery.一种基于轮廓的确定性序贯蒙特卡罗基序发现算法。
Bioinformatics. 2008 Jan 1;24(1):46-55. doi: 10.1093/bioinformatics/btm543. Epub 2007 Nov 17.
7
Training HMM structure with genetic algorithm for biological sequence analysis.使用遗传算法训练隐马尔可夫模型结构用于生物序列分析。
Bioinformatics. 2004 Dec 12;20(18):3613-9. doi: 10.1093/bioinformatics/bth454. Epub 2004 Aug 5.
8
On counting position weight matrix matches in a sequence, with application to discriminative motif finding.关于计算序列中的位置权重矩阵匹配及其在判别性基序发现中的应用。
Bioinformatics. 2006 Jul 15;22(14):e454-63. doi: 10.1093/bioinformatics/btl227.
9
A transdimensional Bayesian model for pattern recognition in DNA sequences.一种用于DNA序列模式识别的跨维度贝叶斯模型。
Biostatistics. 2008 Oct;9(4):668-85. doi: 10.1093/biostatistics/kxm058. Epub 2008 Mar 18.
10
MotifCut: regulatory motifs finding with maximum density subgraphs.MotifCut:通过最大密度子图寻找调控基序
Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.

引用本文的文献

1
Review of Different Sequence Motif Finding Algorithms.不同序列基序查找算法综述。
Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148.
2
An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters.经验先验提高了基因启动子中转录因子结合位点频率贝叶斯估计的准确性。
Bioinform Biol Insights. 2016 Oct 25;9(Suppl 4):59-69. doi: 10.4137/BBI.S29330. eCollection 2015.
3
Extracting sequence features to predict protein-DNA interactions: a comparative study.提取序列特征以预测蛋白质 - DNA 相互作用:一项比较研究。
Nucleic Acids Res. 2008 Jul;36(12):4137-48. doi: 10.1093/nar/gkn361. Epub 2008 Jun 13.
4
CSMET: comparative genomic motif detection via multi-resolution phylogenetic shadowing.CSMET:通过多分辨率系统发育影子进行比较基因组基序检测
PLoS Comput Biol. 2008 Jun 6;4(6):e1000090. doi: 10.1371/journal.pcbi.1000090.
5
rMotifGen: random motif generator for DNA and protein sequences.rMotifGen:用于DNA和蛋白质序列的随机基序生成器。
BMC Bioinformatics. 2007 Aug 7;8:292. doi: 10.1186/1471-2105-8-292.
6
Genome-wide mapping of ORC and Mcm2p binding sites on tiling arrays and identification of essential ARS consensus sequences in S. cerevisiae.酿酒酵母中ORC和Mcm2p结合位点在平铺阵列上的全基因组图谱绘制以及必需ARS共有序列的鉴定。
BMC Genomics. 2006 Oct 26;7:276. doi: 10.1186/1471-2164-7-276.
7
From sequence to structure and back again: approaches for predicting protein-DNA binding.从序列到结构再回归:预测蛋白质与DNA结合的方法
Proteome Sci. 2004 Jun 17;2(1):3. doi: 10.1186/1477-5956-2-3.