• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MotifCut:通过最大密度子图寻找调控基序

MotifCut: regulatory motifs finding with maximum density subgraphs.

作者信息

Fratkin Eugene, Naughton Brian T, Brutlag Douglas L, Batzoglou Serafim

机构信息

Department of Computer Science, Stanford University, California 94305, USA.

出版信息

Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.

DOI:10.1093/bioinformatics/btl243
PMID:16873465
Abstract

MOTIVATION

DNA motif finding is one of the core problems in computational biology, for which several probabilistic and discrete approaches have been developed. Most existing methods formulate motif finding as an intractable optimization problem and rely either on expectation maximization (EM) or on local heuristic searches. Another challenge is the choice of motif model: simpler models such as the position-specific scoring matrix (PSSM) impose biologically unrealistic assumptions such as independence of the motif positions, while more involved models are harder to parametrize and learn.

RESULTS

We present MotifCut, a graph-theoretic approach to motif finding leading to a convex optimization problem with a polynomial time solution. We build a graph where the vertices represent all k-mers in the input sequences, and edges represent pairwise k-mer similarity. In this graph, we search for a motif as the maximum density subgraph, which is a set of k-mers that exhibit a large number of pairwise similarities. Our formulation does not make strong assumptions regarding the structure of the motif and in practice both motifs that fit well the PSSM model, and those that exhibit strong dependencies between position pairs are found as dense subgraphs. We benchmark MotifCut on both synthetic and real yeast motifs, and find that it compares favorably to existing popular methods. The ability of MotifCut to detect motifs appears to scale well with increasing input size. Moreover, the motifs we discover are different from those discovered by the other methods.

AVAILABILITY

MotifCut server and other materials can be found at motifcut.stanford.edu.

摘要

动机

DNA 基序查找是计算生物学中的核心问题之一,针对此问题已开发出多种概率和离散方法。大多数现有方法将基序查找表述为一个难以处理的优化问题,并且要么依赖期望最大化(EM),要么依赖局部启发式搜索。另一个挑战是基序模型的选择:诸如位置特异性得分矩阵(PSSM)等较简单的模型会强加一些生物学上不现实的假设,例如基序位置的独立性,而更复杂的模型则更难进行参数化和学习。

结果

我们提出了MotifCut,一种用于基序查找的图论方法,它会导致一个具有多项式时间解的凸优化问题。我们构建一个图,其中顶点代表输入序列中的所有 k 元组,边代表成对的 k 元组相似性。在这个图中,我们将基序搜索为最大密度子图,即一组表现出大量成对相似性的 k 元组。我们的公式没有对基序的结构做出强有力的假设,并且在实践中,既适合 PSSM 模型的基序,也能找到那些在位置对之间表现出强依赖性的基序作为密集子图。我们在合成和真实的酵母基序上对MotifCut进行了基准测试,发现它与现有的流行方法相比具有优势。MotifCut检测基序的能力似乎随着输入大小的增加而扩展良好。此外,我们发现的基序与其他方法发现的基序不同。

可用性

MotifCut服务器和其他材料可在motifcut.stanford.edu上找到。

相似文献

1
MotifCut: regulatory motifs finding with maximum density subgraphs.MotifCut:通过最大密度子图寻找调控基序
Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.
2
Finding motifs from all sequences with and without binding sites.从所有具有和不具有结合位点的序列中寻找基序。
Bioinformatics. 2006 Sep 15;22(18):2217-23. doi: 10.1093/bioinformatics/btl371. Epub 2006 Jul 26.
3
Phylogeny based discovery of regulatory elements.基于系统发育的调控元件发现
BMC Bioinformatics. 2006 May 22;7:266. doi: 10.1186/1471-2105-7-266.
4
On counting position weight matrix matches in a sequence, with application to discriminative motif finding.关于计算序列中的位置权重矩阵匹配及其在判别性基序发现中的应用。
Bioinformatics. 2006 Jul 15;22(14):e454-63. doi: 10.1093/bioinformatics/btl227.
5
Predicting genetic regulatory response using classification.使用分类方法预测基因调控反应。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.
6
Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone.同类比较:提升模体发现工具在临界区域的性能及其显著性分析
Bioinformatics. 2006 Jul 15;22(14):e393-401. doi: 10.1093/bioinformatics/btl245.
7
Informative priors based on transcription factor structural class improve de novo motif discovery.基于转录因子结构类别的信息先验改进了从头基序发现。
Bioinformatics. 2006 Jul 15;22(14):e384-92. doi: 10.1093/bioinformatics/btl251.
8
MUSA: a parameter free algorithm for the identification of biologically significant motifs.MUSA:一种用于识别具有生物学意义基序的无参数算法。
Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.
9
RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors.RankMotif++:一种考虑结合转录因子中K-mer相对排名的基序搜索算法。
Bioinformatics. 2007 Jul 1;23(13):i72-9. doi: 10.1093/bioinformatics/btm224.
10
A graph-based approach to systematically reconstruct human transcriptional regulatory modules.一种基于图形的方法来系统地重建人类转录调控模块。
Bioinformatics. 2007 Jul 1;23(13):i577-86. doi: 10.1093/bioinformatics/btm227.

引用本文的文献

1
Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans.转录因子结合 k- -mer 分析阐明了人类结合特异性和顺式调控 SNP 的细胞类型依赖性。
BMC Genomics. 2023 Oct 7;24(1):597. doi: 10.1186/s12864-023-09692-9.
2
A memetic algorithm for finding multiple subgraphs that optimally cover an input network.一种用于寻找多个最佳覆盖输入网络的子图的遗传算法。
PLoS One. 2023 Jan 20;18(1):e0280506. doi: 10.1371/journal.pone.0280506. eCollection 2023.
3
Temporal networks in biology and medicine: a survey on models, algorithms, and tools.
生物学与医学中的时间网络:关于模型、算法和工具的综述
Netw Model Anal Health Inform Bioinform. 2023;12(1):10. doi: 10.1007/s13721-022-00406-x. Epub 2022 Dec 31.
4
A Review on Planted (, d) Motif Discovery Algorithms for Medical Diagnose.基于(, d)基序发现算法的医学诊断综述。
Sensors (Basel). 2022 Feb 5;22(3):1204. doi: 10.3390/s22031204.
5
Circulating microRNA trafficking and regulation: computational principles and practice.循环 microRNA 转运和调控:计算原理与实践。
Brief Bioinform. 2020 Jul 15;21(4):1313-1326. doi: 10.1093/bib/bbz079.
6
Review of Different Sequence Motif Finding Algorithms.不同序列基序查找算法综述。
Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148.
7
A novel -mer set memory (KSM) motif representation improves regulatory variant prediction.一种新型 -mer 集记忆 (KSM) 基序表示法可提高调控变异预测的准确性。
Genome Res. 2018 Jun;28(6):891-900. doi: 10.1101/gr.226852.117. Epub 2018 Apr 13.
8
A systematic approach to RNA-associated motif discovery.一种系统的 RNA 相关基序发现方法。
BMC Genomics. 2018 Feb 14;19(1):146. doi: 10.1186/s12864-018-4528-x.
9
A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.一个关于 motif 发现网络工具的调查,用于检测 ChIP-Seq 数据中的结合位点 motif。
Biol Direct. 2014 Feb 20;9:4. doi: 10.1186/1745-6150-9-4.
10
PairMotif+: a fast and effective algorithm for de novo motif discovery in DNA sequences.PairMotif+:一种快速有效的 DNA 序列从头发现基序的算法。
Int J Biol Sci. 2013 Apr 29;9(4):412-24. doi: 10.7150/ijbs.5786. Print 2013.