• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

马尔可夫序列中模式的局部统计显著性测定及其在启动子元件识别中的应用

Determination of local statistical significance of patterns in Markov sequences with application to promoter element identification.

作者信息

Huang Haiyan, Kao Ming-Chih J, Zhou Xianghong, Liu Jun S, Wong Wing H

机构信息

Department of Biostatistics, Harvard University, 655 Huntington Avenue, Boston, MA 02115, USA.

出版信息

J Comput Biol. 2004;11(1):1-14. doi: 10.1089/106652704773416858.

DOI:10.1089/106652704773416858
PMID:15072685
Abstract

High-level eukaryotic genomes present a particular challenge to the computational identification of transcription factor binding sites (TFBSs) because of their long noncoding regions and large numbers of repeat elements. This is evidenced by the noisy results generated by most current methods. In this paper, we present a p-value-based scoring scheme using probability generating functions to evaluate the statistical significance of potential TFBSs. Furthermore, we introduce the local genomic context into the model so that candidate sites are evaluated based both on their similarities to known binding sites and on their contrasts against their respective local genomic contexts. We demonstrate that our approach is advantageous in the prediction of myogenin and MEF2 binding sites in the human genome. We also apply LMM to large-scale human binding site sequences in situ and found that, compared to current popular methods, LMM analysis can reduce false positive errors by more than 50% without compromising sensitivity. This improvement will be of importance to any subsequent algorithm that aims to detect regulatory modules based on known PSSMs.

摘要

由于高级真核生物基因组存在长非编码区域和大量重复元件,其转录因子结合位点(TFBSs)的计算识别面临特殊挑战。当前大多数方法产生的结果嘈杂,这证明了这一点。在本文中,我们提出一种基于p值的评分方案,使用概率生成函数来评估潜在TFBSs的统计显著性。此外,我们将局部基因组背景引入模型,以便基于候选位点与已知结合位点的相似性以及与各自局部基因组背景的差异来对其进行评估。我们证明,我们的方法在预测人类基因组中的肌细胞生成素和MEF2结合位点方面具有优势。我们还将线性混合模型(LMM)应用于大规模人类结合位点序列原位分析,发现与当前流行方法相比,LMM分析在不影响灵敏度的情况下可将假阳性错误减少50%以上。这一改进对于任何旨在基于已知位置特异性得分矩阵(PSSMs)检测调控模块的后续算法都很重要。

相似文献

1
Determination of local statistical significance of patterns in Markov sequences with application to promoter element identification.马尔可夫序列中模式的局部统计显著性测定及其在启动子元件识别中的应用
J Comput Biol. 2004;11(1):1-14. doi: 10.1089/106652704773416858.
2
MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes.MAPPER:一种用于在多个基因组中通过计算识别潜在转录因子结合位点的搜索引擎。
BMC Bioinformatics. 2005 Mar 30;6:79. doi: 10.1186/1471-2105-6-79.
3
Computation-based discovery of cis-regulatory modules by hidden Markov model.基于计算方法,通过隐马尔可夫模型发现顺式调控模块。
J Comput Biol. 2008 Apr;15(3):279-90. doi: 10.1089/cmb.2008.0024.
4
Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data.利用基因表达和启动子分析数据对人类启动子的转录调控元件进行全基因组预测。
BMC Bioinformatics. 2006 Jul 4;7:330. doi: 10.1186/1471-2105-7-330.
5
Integrating genomic data to predict transcription factor binding.整合基因组数据以预测转录因子结合
Genome Inform. 2005;16(1):83-94.
6
A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences.一种用于分析基因组平铺阵列上的染色质免疫沉淀芯片实验的隐马尔可夫模型及其在p53结合序列中的应用。
Bioinformatics. 2005 Jun;21 Suppl 1:i274-82. doi: 10.1093/bioinformatics/bti1046.
7
AVP induces myogenesis through the transcriptional activation of the myocyte enhancer factor 2.抗利尿激素通过肌细胞增强因子2的转录激活诱导肌生成。
Mol Endocrinol. 2002 Jun;16(6):1407-16. doi: 10.1210/mend.16.6.0854.
8
Calcineurin initiates skeletal muscle differentiation by activating MEF2 and MyoD.钙调神经磷酸酶通过激活肌细胞增强因子2(MEF2)和肌分化因子(MyoD)来启动骨骼肌分化。
Differentiation. 2003 Apr;71(3):217-27. doi: 10.1046/j.1432-0436.2003.710303.x.
9
Assessing transcription factor motif drift from noisy decoy sequences.从噪声诱饵序列评估转录因子基序漂移
Genome Inform. 2005;16(1):59-67.
10
Binding site graphs: a new graph theoretical framework for prediction of transcription factor binding sites.结合位点图:一种预测转录因子结合位点的新图论框架。
PLoS Comput Biol. 2007 May;3(5):e90. doi: 10.1371/journal.pcbi.0030090. Epub 2007 Apr 10.

引用本文的文献

1
Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach.使用组合深度学习方法预测转录因子结合位点
Front Oncol. 2022 Jun 3;12:893520. doi: 10.3389/fonc.2022.893520. eCollection 2022.
2
WISCOD: a statistical web-enabled tool for the identification of significant protein coding regions.WISCOD:一种基于网络的统计工具,用于识别重要的蛋白质编码区域。
Biomed Res Int. 2014;2014:282343. doi: 10.1155/2014/282343. Epub 2014 Sep 15.
3
Approximation of sojourn-times via maximal couplings: motif frequency distributions.
通过最大耦合近似逗留时间:基序频率分布
J Math Biol. 2014 Jul;69(1):147-82. doi: 10.1007/s00285-013-0690-6. Epub 2013 Jun 6.
4
Systematic prediction of cis-regulatory elements in the Chlamydomonas reinhardtii genome using comparative genomics.利用比较基因组学系统预测莱茵衣藻基因组中的顺式调控元件。
Plant Physiol. 2012 Oct;160(2):613-23. doi: 10.1104/pp.112.200840. Epub 2012 Aug 22.
5
Optimizing the GATA-3 position weight matrix to improve the identification of novel binding sites.优化 GATA-3 位置权重矩阵以提高新结合位点的识别能力。
BMC Genomics. 2012 Aug 22;13:416. doi: 10.1186/1471-2164-13-416.
6
Importance sampling of word patterns in DNA and protein sequences.DNA和蛋白质序列中词模式的重要性抽样
J Comput Biol. 2010 Dec;17(12):1697-709. doi: 10.1089/cmb.2008.0233.
7
FITBAR: a web tool for the robust prediction of prokaryotic regulons.FITBAR:用于预测原核调控子的稳健性的网络工具。
BMC Bioinformatics. 2010 Nov 11;11:554. doi: 10.1186/1471-2105-11-554.
8
Identification of context-dependent motifs by contrasting ChIP binding data.通过对比 ChIP 结合数据鉴定上下文相关基序。
Bioinformatics. 2010 Nov 15;26(22):2826-32. doi: 10.1093/bioinformatics/btq546. Epub 2010 Sep 23.
9
Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites.使用结合位置信息的贝叶斯模型寻找序列基序:在转录因子结合位点上的应用
BMC Bioinformatics. 2008 Jun 4;9:262. doi: 10.1186/1471-2105-9-262.
10
Probabilistic inference of transcription factor binding from multiple data sources.基于多数据源的转录因子结合概率推断
PLoS One. 2008 Mar 26;3(3):e1820. doi: 10.1371/journal.pone.0001820.