• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NestedMICA:核酸序列中过度代表基序的灵敏推断

NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.

作者信息

Down Thomas A, Hubbard Tim J P

机构信息

Wellcome Trust Sanger Institute, Hinxton Cambridge, CB10 1SA, UK.

出版信息

Nucleic Acids Res. 2005 Mar 10;33(5):1445-53. doi: 10.1093/nar/gki282. Print 2005.

DOI:10.1093/nar/gki282
PMID:15760844
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1064142/
Abstract

NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.

摘要

NestedMICA是一种全新的、可扩展的模式发现系统,用于在生物序列中寻找转录因子结合位点和类似基序。与之前的几种方法一样,NestedMICA通过优化概率混合模型以拟合一组序列来解决这个问题。然而,使用一种名为嵌套采样的新开发推理策略意味着NestedMICA能够找到最优解,而无需进行有问题的初始化或种子步骤。我们在一系列场景中、在合成数据和一组特征明确的肌肉调节区域上研究了NestedMICA的性能,并将其与流行的MEME程序进行比较。我们表明,新方法比MEME显著更灵敏:在一个案例中,它成功地从比现有程序所能处理的背景序列长四倍的序列中提取了目标基序。它在包含多个显著基序的合成序列上也表现稳健。当在一组真实的调控序列上进行测试时,NestedMICA产生的基序是所有五类丰富注释结合位点的良好预测指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/e6b477ec2fe9/gki282f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/65d700802605/gki282f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/e183b49ba7e8/gki282f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/8ccb4d6c74eb/gki282f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/8c6b1974798d/gki282f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/6866fbc39822/gki282f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/6e33a2c0ccf8/gki282f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/37347667772d/gki282f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/e6b477ec2fe9/gki282f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/65d700802605/gki282f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/e183b49ba7e8/gki282f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/8ccb4d6c74eb/gki282f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/8c6b1974798d/gki282f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/6866fbc39822/gki282f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/6e33a2c0ccf8/gki282f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/37347667772d/gki282f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/481e/1064142/e6b477ec2fe9/gki282f8.jpg

相似文献

1
NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.NestedMICA:核酸序列中过度代表基序的灵敏推断
Nucleic Acids Res. 2005 Mar 10;33(5):1445-53. doi: 10.1093/nar/gki282. Print 2005.
2
NestedMICA as an ab initio protein motif discovery tool.NestedMICA作为一种从头开始的蛋白质基序发现工具。
BMC Bioinformatics. 2008 Jan 14;9:19. doi: 10.1186/1471-2105-9-19.
3
iMotifs: an integrated sequence motif visualization and analysis environment.iMotifs:一个集成的序列基序可视化和分析环境。
Bioinformatics. 2010 Mar 15;26(6):843-4. doi: 10.1093/bioinformatics/btq026. Epub 2010 Jan 26.
4
Metamotifs--a generative model for building families of nucleotide position weight matrices.Metamotifs--一种构建核苷酸位置权重矩阵家族的生成模型。
BMC Bioinformatics. 2010 Jun 25;11:348. doi: 10.1186/1471-2105-11-348.
5
Sequential Integration of Fuzzy Clustering and Expectation Maximization for Transcription Factor Binding Site Identification.用于转录因子结合位点识别的模糊聚类与期望最大化的顺序集成
J Comput Biol. 2018 Nov;25(11):1247-1256. doi: 10.1089/cmb.2017.0230. Epub 2018 Aug 22.
6
Detection of functional DNA motifs via statistical over-representation.通过统计过度代表性检测功能性DNA基序。
Nucleic Acids Res. 2004 Feb 26;32(4):1372-81. doi: 10.1093/nar/gkh299. Print 2004.
7
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PhyloGibbs:一种整合了系统发育的吉布斯采样基序查找器。
PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.
8
A cluster refinement algorithm for motif discovery.一种用于发现模体的簇精炼算法。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):654-68. doi: 10.1109/TCBB.2009.25.
9
Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites.使用结合位置信息的贝叶斯模型寻找序列基序:在转录因子结合位点上的应用
BMC Bioinformatics. 2008 Jun 4;9:262. doi: 10.1186/1471-2105-9-262.
10
A study on the application of topic models to motif finding algorithms.主题模型在基序查找算法中的应用研究。
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):502. doi: 10.1186/s12859-016-1364-3.

引用本文的文献

1
Identifying promoter sequence architectures via a chunking-based algorithm using non-negative matrix factorisation.通过基于分块的算法和非负矩阵分解来识别启动子序列结构。
PLoS Comput Biol. 2023 Nov 20;19(11):e1011491. doi: 10.1371/journal.pcbi.1011491. eCollection 2023 Nov.
2
Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses.准确识别冠状病毒中的转录调控序列和基因。
Mol Biol Evol. 2022 Jul 2;39(7). doi: 10.1093/molbev/msac133.
3
Resources to Discover and Use Short Linear Motifs in Viral Proteins.

本文引用的文献

1
Assessing computational tools for the discovery of transcription factor binding sites.评估用于发现转录因子结合位点的计算工具。
Nat Biotechnol. 2005 Jan;23(1):137-44. doi: 10.1038/nbt1053.
2
Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster.果蝇DNA酶I足迹数据库:黑腹果蝇中转录因子结合位点的系统基因组注释。
Bioinformatics. 2005 Apr 15;21(8):1747-9. doi: 10.1093/bioinformatics/bti173. Epub 2004 Nov 30.
3
Independent component analysis of microarray data in the study of endometrial cancer.
发现和利用病毒蛋白中短线性基序的资源。
Trends Biotechnol. 2020 Jan;38(1):113-127. doi: 10.1016/j.tibtech.2019.07.004. Epub 2019 Aug 16.
4
Src promotes castration-recurrent prostate cancer through androgen receptor-dependent canonical and non-canonical transcriptional signatures.Src通过雄激素受体依赖性的经典和非经典转录特征促进去势抵抗性前列腺癌。
Oncotarget. 2017 Feb 7;8(6):10324-10347. doi: 10.18632/oncotarget.14401.
5
Intron sequences that stimulate gene expression in Arabidopsis.内含子序列可刺激拟南芥基因表达。
Plant Mol Biol. 2016 Oct;92(3):337-46. doi: 10.1007/s11103-016-0516-1. Epub 2016 Aug 5.
6
MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures.MOST+:一种结合基因组序列和异质全基因组特征的从头基序发现方法。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S13. doi: 10.1186/1471-2164-16-S7-S13. Epub 2015 Jun 11.
7
Evolutionary Dynamics of GLD-1-mRNA complexes in Caenorhabditis nematodes.秀丽隐杆线虫中GLD-1-mRNA复合物的进化动力学
Genome Biol Evol. 2014 Dec 9;7(1):314-35. doi: 10.1093/gbe/evu272.
8
Tandem repeats and G-rich sequences are enriched at human CNV breakpoints.串联重复序列和富含G的序列在人类拷贝数变异(CNV)断点处富集。
PLoS One. 2014 Jul 1;9(7):e101607. doi: 10.1371/journal.pone.0101607. eCollection 2014.
9
STEME: a robust, accurate motif finder for large data sets.STEME:一种用于大型数据集的强大、精确的基序查找工具。
PLoS One. 2014 Mar 13;9(3):e90735. doi: 10.1371/journal.pone.0090735. eCollection 2014.
10
A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.一个关于 motif 发现网络工具的调查,用于检测 ChIP-Seq 数据中的结合位点 motif。
Biol Direct. 2014 Feb 20;9:4. doi: 10.1186/1745-6150-9-4.
子宫内膜癌研究中微阵列数据的独立成分分析
Oncogene. 2004 Aug 26;23(39):6677-83. doi: 10.1038/sj.onc.1207562.
4
Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes.Weeder Web:在一组共调控基因的序列中发现转录因子结合位点
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W199-203. doi: 10.1093/nar/gkh465.
5
JASPAR: an open-access database for eukaryotic transcription factor binding profiles.JASPAR:一个用于真核转录因子结合图谱的开放获取数据库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4. doi: 10.1093/nar/gkh012.
6
Gibbs Recursive Sampler: finding transcription factor binding sites.吉布斯递归采样器:寻找转录因子结合位点。
Nucleic Acids Res. 2003 Jul 1;31(13):3580-5. doi: 10.1093/nar/gkg608.
7
Additivity in protein-DNA interactions: how good an approximation is it?蛋白质与DNA相互作用中的加性:它的近似程度如何?
Nucleic Acids Res. 2002 Oct 15;30(20):4442-51. doi: 10.1093/nar/gkf578.
8
The Ensembl genome database project.Ensembl基因组数据库项目。
Nucleic Acids Res. 2002 Jan 1;30(1):38-41. doi: 10.1093/nar/30.1.38.
9
A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling.高阶背景模型通过吉布斯采样改进了启动子调控元件的检测。
Bioinformatics. 2001 Dec;17(12):1113-22. doi: 10.1093/bioinformatics/17.12.1113.
10
Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification.使用后缀树提取结构化基序的算法及其在启动子和调控位点共有序列识别中的应用。
J Comput Biol. 2000;7(3-4):345-62. doi: 10.1089/106652700750050826.