• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于基序识别的优化混合马尔可夫模型。

Optimized mixed Markov models for motif identification.

作者信息

Huang Weichun, Umbach David M, Ohler Uwe, Li Leping

机构信息

Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27606, USA.

出版信息

BMC Bioinformatics. 2006 Jun 2;7:279. doi: 10.1186/1471-2105-7-279.

DOI:10.1186/1471-2105-7-279
PMID:16749929
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1534070/
Abstract

BACKGROUND

Identifying functional elements, such as transcriptional factor binding sites, is a fundamental step in reconstructing gene regulatory networks and remains a challenging issue, largely due to limited availability of training samples.

RESULTS

We introduce a novel and flexible model, the Optimized Mixture Markov model (OMiMa), and related methods to allow adjustment of model complexity for different motifs. In comparison with other leading methods, OMiMa can incorporate more than the NNSplice's pairwise dependencies; OMiMa avoids model over-fitting better than the Permuted Variable Length Markov Model (PVLMM); and OMiMa requires smaller training samples than the Maximum Entropy Model (MEM). Testing on both simulated and actual data (regulatory cis-elements and splice sites), we found OMiMa's performance superior to the other leading methods in terms of prediction accuracy, required size of training data or computational time. Our OMiMa system, to our knowledge, is the only motif finding tool that incorporates automatic selection of the best model. OMiMa is freely available at 1.

CONCLUSION

Our optimized mixture of Markov models represents an alternative to the existing methods for modeling dependent structures within a biological motif. Our model is conceptually simple and effective, and can improve prediction accuracy and/or computational speed over other leading methods.

摘要

背景

识别功能元件,如转录因子结合位点,是重建基因调控网络的基本步骤,并且仍然是一个具有挑战性的问题,这主要是由于训练样本的可用性有限。

结果

我们引入了一种新颖且灵活的模型,即优化混合马尔可夫模型(OMiMa)以及相关方法,以允许针对不同基序调整模型复杂度。与其他领先方法相比,OMiMa能够纳入比NNSplice更多的成对依赖性;OMiMa比置换可变长度马尔可夫模型(PVLMM)能更好地避免模型过拟合;并且OMiMa比最大熵模型(MEM)需要的训练样本更少。在模拟数据和实际数据(调控顺式元件和剪接位点)上进行测试时,我们发现OMiMa在预测准确性、所需训练数据大小或计算时间方面的性能优于其他领先方法。据我们所知,我们的OMiMa系统是唯一包含自动选择最佳模型的基序发现工具。OMiMa可在1免费获取。

结论

我们优化的马尔可夫模型混合体是用于对生物基序内的依赖结构进行建模的现有方法的一种替代方案。我们的模型在概念上简单且有效,并且与其他领先方法相比能够提高预测准确性和/或计算速度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/17ee14b30aff/1471-2105-7-279-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/0b984360e9cf/1471-2105-7-279-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/fc27e49cab46/1471-2105-7-279-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/86e689d63e36/1471-2105-7-279-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/d079bc4ada17/1471-2105-7-279-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/bc64cbae58df/1471-2105-7-279-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/d3181bc4c78e/1471-2105-7-279-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/ad0710fe8722/1471-2105-7-279-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/17ee14b30aff/1471-2105-7-279-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/0b984360e9cf/1471-2105-7-279-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/fc27e49cab46/1471-2105-7-279-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/86e689d63e36/1471-2105-7-279-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/d079bc4ada17/1471-2105-7-279-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/bc64cbae58df/1471-2105-7-279-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/d3181bc4c78e/1471-2105-7-279-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/ad0710fe8722/1471-2105-7-279-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87b1/1534070/17ee14b30aff/1471-2105-7-279-8.jpg

相似文献

1
Optimized mixed Markov models for motif identification.用于基序识别的优化混合马尔可夫模型。
BMC Bioinformatics. 2006 Jun 2;7:279. doi: 10.1186/1471-2105-7-279.
2
Finding short DNA motifs using permuted Markov models.使用置换马尔可夫模型寻找短DNA基序。
J Comput Biol. 2005 Jul-Aug;12(6):894-906. doi: 10.1089/cmb.2005.12.894.
3
HMMoC--a compiler for hidden Markov models.HMMoC——一种用于隐马尔可夫模型的编译器。
Bioinformatics. 2007 Sep 15;23(18):2485-7. doi: 10.1093/bioinformatics/btm350. Epub 2007 Jul 10.
4
A transdimensional Bayesian model for pattern recognition in DNA sequences.一种用于DNA序列模式识别的跨维度贝叶斯模型。
Biostatistics. 2008 Oct;9(4):668-85. doi: 10.1093/biostatistics/kxm058. Epub 2008 Mar 18.
5
Efficient decoding algorithms for generalized hidden Markov model gene finders.用于广义隐马尔可夫模型基因查找器的高效解码算法。
BMC Bioinformatics. 2005 Jan 24;6:16. doi: 10.1186/1471-2105-6-16.
6
High-recall protein entity recognition using a dictionary.使用词典进行高召回率蛋白质实体识别。
Bioinformatics. 2005 Jun;21 Suppl 1(Suppl 1):i266-73. doi: 10.1093/bioinformatics/bti1006.
7
Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method.β-桶状外膜蛋白拓扑结构预测方法的评估及一种共识预测方法
BMC Bioinformatics. 2005 Jan 12;6:7. doi: 10.1186/1471-2105-6-7.
8
MRFy: Remote Homology Detection for Beta-Structural Proteins Using Markov Random Fields and Stochastic Search.MRFy:使用马尔可夫随机场和随机搜索对β结构蛋白进行远程同源性检测
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):4-16. doi: 10.1109/TCBB.2014.2344682.
9
A correlated motif approach for finding short linear motifs from protein interaction networks.一种用于从蛋白质相互作用网络中寻找短线性基序的相关基序方法。
BMC Bioinformatics. 2006 Nov 16;7:502. doi: 10.1186/1471-2105-7-502.
10
Regulatory motif finding by logic regression.通过逻辑回归进行调控基序发现。
Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

引用本文的文献

1
EnsembleSplice: ensemble deep learning model for splice site prediction.EnsembleSplice:用于剪接位点预测的集成深度学习模型。
BMC Bioinformatics. 2022 Oct 6;23(1):413. doi: 10.1186/s12859-022-04971-w.
2
Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations.细菌启动子预测工具的基准测试:潜力与局限
mSystems. 2020 Aug 25;5(4):e00439-20. doi: 10.1128/mSystems.00439-20.
3
MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.MODER2:一阶马尔可夫建模和单体及二聚体结合基序的发现。

本文引用的文献

1
Finding short DNA motifs using permuted Markov models.使用置换马尔可夫模型寻找短DNA基序。
J Comput Biol. 2005 Jul-Aug;12(6):894-906. doi: 10.1089/cmb.2005.12.894.
2
Conservation of regulatory sequences and gene expression patterns in the disintegrating Drosophila Hox gene complex.果蝇Hox基因复合体解体过程中调控序列和基因表达模式的保守性
Genome Res. 2005 May;15(5):692-700. doi: 10.1101/gr.3468605.
3
Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals.通过比较多种哺乳动物系统地发现人类启动子和3'非翻译区中的调控基序。
Bioinformatics. 2020 May 1;36(9):2690-2696. doi: 10.1093/bioinformatics/btaa045.
4
Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.在预测核苷酸序列中的基序方面,贝叶斯马尔可夫模型始终优于位置权重矩阵。
Nucleic Acids Res. 2016 Jul 27;44(13):6055-69. doi: 10.1093/nar/gkw521. Epub 2016 Jun 9.
5
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.经实验验证的转录因子结合位点模型在ChIP-Seq数据计算分析中的应用。
BMC Genomics. 2014 Jan 29;15(1):80. doi: 10.1186/1471-2164-15-80.
6
A method for identifying alternative or cryptic donor splice sites within gene and mRNA sequences. Comparisons among sequences from vertebrates, echinoderms and other groups.一种在基因和mRNA序列中识别替代或隐蔽供体剪接位点的方法。脊椎动物、棘皮动物和其他类群序列之间的比较。
BMC Genomics. 2009 Jul 16;10:318. doi: 10.1186/1471-2164-10-318.
7
Variable-length positional modeling for biological sequence classification.用于生物序列分类的可变长度位置建模
AMIA Annu Symp Proc. 2008 Nov 6;2008:91-5.
8
Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions.结合优化、遗传算法和判别分析以捕捉远距离相互作用来进行有效的转录因子结合位点预测。
BMC Bioinformatics. 2007 Dec 19;8:481. doi: 10.1186/1471-2105-8-481.
9
Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools.启动子进化的系统发育模拟:结合位点周转事件的估计与建模及其对序列比对工具影响的评估
Genome Biol. 2007;8(10):R225. doi: 10.1186/gb-2007-8-10-r225.
Nature. 2005 Mar 17;434(7031):338-45. doi: 10.1038/nature03441. Epub 2005 Feb 27.
4
Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals.短序列基序的最大熵建模及其在RNA剪接信号中的应用
J Comput Biol. 2004;11(2-3):377-94. doi: 10.1089/1066527041410418.
5
WebLogo: a sequence logo generator.WebLogo:一个序列图生成器。
Genome Res. 2004 Jun;14(6):1188-90. doi: 10.1101/gr.849004.
6
Comparative analysis detects dependencies among the 5' splice-site positions.比较分析可检测5'剪接位点位置之间的相关性。
RNA. 2004 May;10(5):828-40. doi: 10.1261/rna.5196404.
7
Modeling within-motif dependence for transcription factor binding site predictions.用于转录因子结合位点预测的模体内依赖性建模。
Bioinformatics. 2004 Apr 12;20(6):909-16. doi: 10.1093/bioinformatics/bth006. Epub 2004 Jan 29.
8
MATCH: A tool for searching transcription factor binding sites in DNA sequences.MATCH:一种用于在DNA序列中搜索转录因子结合位点的工具。
Nucleic Acids Res. 2003 Jul 1;31(13):3576-9. doi: 10.1093/nar/gkg585.
9
The evolution of transcriptional regulation in eukaryotes.真核生物转录调控的进化
Mol Biol Evol. 2003 Sep;20(9):1377-419. doi: 10.1093/molbev/msg140. Epub 2003 May 30.
10
Sequencing and comparison of yeast species to identify genes and regulatory elements.对酵母物种进行测序和比较以鉴定基因和调控元件。
Nature. 2003 May 15;423(6937):241-54. doi: 10.1038/nature01644.