• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过期望最大化算法同时学习DNA基序及其位置和序列排名偏好。

Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.

作者信息

Zhang ZhiZhuo, Chang Cheng Wei, Hugo Willy, Cheung Edwin, Sung Wing-Kin

机构信息

National University of Singapore, Singapore, Singapore.

出版信息

J Comput Biol. 2013 Mar;20(3):237-48. doi: 10.1089/cmb.2012.0233.

DOI:10.1089/cmb.2012.0233
PMID:23461573
Abstract

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.

摘要

虽然可以通过挖掘过度呈现的序列模式来发现从头基序,但这种方法会遗漏一些真实的基序并产生许多假阳性。为了提高准确性,一种解决方案是考虑一些额外的结合特征(即位置偏好和序列排名偏好)。通常需要用户提供此信息。本文提出了一种称为SEME(用于基序引出的期望最大化采样)的从头基序发现算法,该算法使用纯概率混合模型对基序的结合特征进行建模,并使用期望最大化(EM)算法同时学习序列基序、位置和序列排名偏好,而无需向用户询问任何先验知识。由于两项重要技术:可变基序长度扩展和重要性采样,SEME既高效又准确。使用75个大规模合成数据集、32个后生动物纲要基准数据集和164个染色质免疫沉淀测序(ChIP-Seq)文库,我们证明了SEME在寻找转录因子(TF)结合位点方面优于现有程序。SEME进一步应用于在15个ChIP-Seq文库中寻找共调控TF(coTF)基序这一更具挑战性的问题。它识别出了显著更多正确的coTF基序,同时预测的coTF基序与已知基序的匹配度更高。最后,我们表明每个coTF的学习到的位置和序列排名偏好揭示了这些位点内主要TF和coTF之间潜在的相互作用机制。其中一些发现通过coTF的ChIP-Seq实验得到了进一步验证。该应用程序可在线获取。

相似文献

1
Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.通过期望最大化算法同时学习DNA基序及其位置和序列排名偏好。
J Comput Biol. 2013 Mar;20(3):237-48. doi: 10.1089/cmb.2012.0233.
2
Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data.基于染色质免疫沉淀测序(ChIP-seq)数据优化选择PWM基序数据库和序列扫描方法。
BMC Bioinformatics. 2015 May 1;16:140. doi: 10.1186/s12859-015-0573-5.
3
Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。
BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.
4
Sequential Integration of Fuzzy Clustering and Expectation Maximization for Transcription Factor Binding Site Identification.用于转录因子结合位点识别的模糊聚类与期望最大化的顺序集成
J Comput Biol. 2018 Nov;25(11):1247-1256. doi: 10.1089/cmb.2017.0230. Epub 2018 Aug 22.
5
Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets.用于大数据集的单体和二聚体转录因子结合基序的模块化发现。
Nucleic Acids Res. 2018 May 4;46(8):e44. doi: 10.1093/nar/gky027.
6
Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.使用判别目标函数和动态搜索空间识别预测性顺式调控元件。
PLoS One. 2015 Oct 14;10(10):e0140557. doi: 10.1371/journal.pone.0140557. eCollection 2015.
7
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.一种用于ChIP-Seq数据集的快速聚类基序发现算法。
Biomed Res Int. 2015;2015:218068. doi: 10.1155/2015/218068. Epub 2015 Jul 5.
8
Tree-based position weight matrix approach to model transcription factor binding site profiles.基于树的位置权重矩阵方法来模拟转录因子结合位点图谱。
PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.
9
The value of position-specific priors in motif discovery using MEME.MEME 中位置特异性先验在基序发现中的价值。
BMC Bioinformatics. 2010 Apr 9;11:179. doi: 10.1186/1471-2105-11-179.
10
EXTREME: an online EM algorithm for motif discovery.极端:一种用于基序发现的在线 EM 算法。
Bioinformatics. 2014 Jun 15;30(12):1667-73. doi: 10.1093/bioinformatics/btu093. Epub 2014 Feb 14.

引用本文的文献

1
MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.MODER2:一阶马尔可夫建模和单体及二聚体结合基序的发现。
Bioinformatics. 2020 May 1;36(9):2690-2696. doi: 10.1093/bioinformatics/btaa045.
2
Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets.用于大数据集的单体和二聚体转录因子结合基序的模块化发现。
Nucleic Acids Res. 2018 May 4;46(8):e44. doi: 10.1093/nar/gky027.
3
Genome-wide mapping and analysis of aryl hydrocarbon receptor (AHR)- and aryl hydrocarbon receptor repressor (AHRR)-binding sites in human breast cancer cells.
全基因组范围内鉴定和分析人乳腺癌细胞中芳香烃受体(AHR)和芳香烃受体阻遏物(AHRR)结合位点。
Arch Toxicol. 2018 Jan;92(1):225-240. doi: 10.1007/s00204-017-2022-x. Epub 2017 Jul 5.
4
WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.WSMD:在转录因子 ChIP-seq 数据中进行弱监督基序发现。
Sci Rep. 2017 Jun 12;7(1):3217. doi: 10.1038/s41598-017-03554-7.
5
A dynamic CTCF chromatin binding landscape promotes DNA hydroxymethylation and transcriptional induction of adipocyte differentiation.动态的CTCF染色质结合图谱促进DNA羟甲基化及脂肪细胞分化的转录诱导。
Nucleic Acids Res. 2014;42(17):10943-59. doi: 10.1093/nar/gku780. Epub 2014 Sep 2.
6
The Brm-HDAC3-Erm repressor complex suppresses dedifferentiation in Drosophila type II neuroblast lineages.Brm-HDAC3-Erm抑制复合物可抑制果蝇II型神经母细胞谱系中的去分化过程。
Elife. 2014 Mar 11;3:e01906. doi: 10.7554/eLife.01906.
7
Peroxisome proliferator-activated receptor γ regulates genes involved in insulin/insulin-like growth factor signaling and lipid metabolism during adipogenesis through functionally distinct enhancer classes.过氧化物酶体增殖物激活受体γ通过功能不同的增强子类别,在脂肪生成过程中调节参与胰岛素/胰岛素样生长因子信号传导和脂质代谢的基因。
J Biol Chem. 2014 Jan 10;289(2):708-22. doi: 10.1074/jbc.M113.526996. Epub 2013 Nov 27.