• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用随机投影寻找基序。

Finding motifs using random projections.

作者信息

Buhler Jeremy, Tompa Martin

机构信息

Department of Computer Science, Box 1045, Washington University, One Brookings Drive, St. Louis, MO 63130, USA.

出版信息

J Comput Biol. 2002;9(2):225-42. doi: 10.1089/10665270252935430.

DOI:10.1089/10665270252935430
PMID:12015879
Abstract

The DNA motif discovery problem abstracts the task of discovering short, conserved sites in genomic DNA. Pevzner and Sze recently described a precise combinatorial formulation of motif discovery that motivates the following algorithmic challenge: find twenty planted occurrences of a motif of length fifteen in roughly twelve kilobases of genomic sequence, where each occurrence of the motif differs from its consensus in four randomly chosen positions. Such "subtle" motifs, though statistically highly significant, expose a weakness in existing motif-finding algorithms, which typically fail to discover them. Pevzner and Sze introduced new algorithms to solve their (15,4)-motif challenge, but these methods do not scale efficiently to more difficult problems in the same family, such as the (14,4)-, (16,5)-, and (18,6)-motif problems. We introduce a novel motif-discovery algorithm, PROJECTION, designed to enhance the performance of existing motif finders using random projections of the input's substrings. Experiments on synthetic data demonstrate that PROJECTION remedies the weakness observed in existing algorithms, typically solving the difficult (14,4)-, (16,5)-, and (18,6)-motif problems. Our algorithm is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge. A probabilistic estimate suggests that related motif-finding problems that PROJECTION fails to solve are in all likelihood inherently intractable. We also test the performance of our algorithm on realistic biological examples, including transcription factor binding sites in eukaryotes and ribosome binding sites in prokaryotes.

摘要

DNA基序发现问题概括了在基因组DNA中发现短的保守位点的任务。佩夫兹纳和斯泽最近描述了一种精确的基序发现组合公式,这引发了以下算法挑战:在大约12千碱基的基因组序列中找到20个长度为15的基序植入实例,其中基序的每个实例在四个随机选择的位置与其共有序列不同。这种“微妙”的基序虽然在统计学上具有高度显著性,但暴露了现有基序查找算法的一个弱点,即这些算法通常无法发现它们。佩夫兹纳和斯泽引入了新算法来解决他们的(15,4)-基序挑战,但这些方法不能有效地扩展到同一类中更困难的问题,如(14,4)-、(16,5)-和(18,6)-基序问题。我们引入了一种新颖的基序发现算法PROJECTION,旨在通过对输入子串的随机投影来提高现有基序查找器的性能。对合成数据的实验表明,PROJECTION弥补了现有算法中观察到的弱点,通常能解决困难的(14,4)-、(16,5)-和(18,6)-基序问题。我们的算法对非均匀背景序列分布具有鲁棒性,并且能够扩展到比原始挑战中指定的序列量更大的情况。概率估计表明,PROJECTION未能解决的相关基序查找问题很可能本质上是难以处理的。我们还在实际生物学实例上测试了我们算法的性能,包括真核生物中的转录因子结合位点和原核生物中的核糖体结合位点。

相似文献

1
Finding motifs using random projections.使用随机投影寻找基序。
J Comput Biol. 2002;9(2):225-42. doi: 10.1089/10665270252935430.
2
An extension and novel solution to the (l,d)-motif challenge problem.针对(l,d)-基序挑战问题的一种扩展及新颖解决方案。
Genome Inform. 2004;15(2):63-71.
3
A uniform projection method for motif discovery in DNA sequences.一种用于在DNA序列中发现基序的统一投影方法。
IEEE/ACM Trans Comput Biol Bioinform. 2004 Apr-Jun;1(2):91-4. doi: 10.1109/TCBB.2004.14.
4
Algorithms for challenging motif problems.用于解决具有挑战性的基序问题的算法。
J Bioinform Comput Biol. 2006 Feb;4(1):43-58. doi: 10.1142/s0219720006001692.
5
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PhyloGibbs:一种整合了系统发育的吉布斯采样基序查找器。
PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.
6
Hybrid Gibbs-sampling algorithm for challenging motif discovery: GibbsDST.用于具有挑战性的基序发现的混合吉布斯采样算法:GibbsDST
Genome Inform. 2006;17(2):3-13.
7
MUSA: a parameter free algorithm for the identification of biologically significant motifs.MUSA:一种用于识别具有生物学意义基序的无参数算法。
Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.
8
Voting algorithms for the motif finding problem.用于基序查找问题的投票算法。
Comput Syst Bioinformatics Conf. 2008;7:37-47.
9
MotifCut: regulatory motifs finding with maximum density subgraphs.MotifCut:通过最大密度子图寻找调控基序
Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.
10
A cluster refinement algorithm for motif discovery.一种用于发现模体的簇精炼算法。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):654-68. doi: 10.1109/TCBB.2009.25.

引用本文的文献

1
Multi-metric locality sensitive hashing enhances alignment accuracy of bisulfite sequencing reads: BisHash.多指标局部敏感哈希提高亚硫酸氢盐测序读段的比对准确性:BisHash。
Bioinform Adv. 2025 Jul 23;5(1):vbaf144. doi: 10.1093/bioadv/vbaf144. eCollection 2025.
2
A Clustering Approach for Motif Discovery in ChIP-Seq Dataset.一种用于ChIP-Seq数据集中基序发现的聚类方法。
Entropy (Basel). 2019 Aug 16;21(8):802. doi: 10.3390/e21080802.
3
Expectation pooling: an effective and interpretable pooling method for predicting DNA-protein binding.
期望池化:一种有效的、可解释的 DNA-蛋白质结合预测池化方法。
Bioinformatics. 2020 Mar 1;36(5):1405-1412. doi: 10.1093/bioinformatics/btz768.
4
Review of Different Sequence Motif Finding Algorithms.不同序列基序查找算法综述。
Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148.
5
MSC: a metagenomic sequence classification algorithm.MSC:一种宏基因组序列分类算法。
Bioinformatics. 2019 Sep 1;35(17):2932-2940. doi: 10.1093/bioinformatics/bty1071.
6
SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.SamSelect:一种用于在大型 DNA 数据集上进行约定种植基序搜索的样本序列选择算法。
BMC Bioinformatics. 2018 Jun 18;19(1):228. doi: 10.1186/s12859-018-2242-y.
7
Noncoding Variants Functional Prioritization Methods Based on Predicted Regulatory Factor Binding Sites.基于预测调控因子结合位点的非编码变异功能优先级排序方法
Curr Genomics. 2017 Aug;18(4):322-331. doi: 10.2174/1389202918666170228143619.
8
An Entropy-Based Position Projection Algorithm for Motif Discovery.一种基于熵的用于基序发现的位置投影算法。
Biomed Res Int. 2016;2016:9127474. doi: 10.1155/2016/9127474. Epub 2016 Nov 2.
9
PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets.配对基序染色质免疫沉淀测序:一种用于在大型染色质免疫沉淀测序数据集中发现保守模式的快速算法。
Biomed Res Int. 2016;2016:4986707. doi: 10.1155/2016/4986707. Epub 2016 Oct 24.
10
RefSelect: a reference sequence selection algorithm for planted (l, d) motif search.RefSelect:一种用于植入(l,d)基序搜索的参考序列选择算法。
BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):266. doi: 10.1186/s12859-016-1130-6.