基于自组织映射的 DNA motif 识别的提取算法，具有异构模型。

SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model.

机构信息

Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria 3086, Australia.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-12-S1-S16.

DOI:10.1186/1471-2105-12-S1-S16

PMID:21342545

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3044270/

Abstract

BACKGROUND

Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties.

RESULTS

This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools.

CONCLUSIONS

Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.

摘要

背景

从背景序列中区分转录因子结合位点（TFBS）在计算基序发现中起着关键作用。当前基于聚类的算法采用同质模型来解决问题，该模型假设基序和背景信号可以等效地描述。这种假设存在一些局限性，因为这两种序列信号具有不同的特性。

结果

本文旨在开发一种基于自组织映射（SOM）的聚类算法，用于从 DNA 序列中提取结合位点。我们的框架基于一种新颖的节点内软竞争过程，以实现数据集中文本与背景信号的最大区分。节点内竞争基于两种不同信号模型的自适应加权技术，以更好地表示这两类信号。使用几个真实和人工数据集，我们将我们提出的方法与几个基序发现工具进行了比较。与最先进的基于 SOM 的基序发现工具 SOMBRERO 相比，发现在不影响其敏感性的情况下，我们的算法可以在真实数据集上显著提高平均精度（即约 27%）。我们的方法与其他基序发现工具相比也表现出色。

结论

基于模型的聚类框架的基序发现应该考虑使用异构模型来表示 DNA 序列中的两类信号。与同质模型相比，这种异构模型可以实现更好的信号区分。

相似文献

SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model.基于自组织映射的 DNA motif 识别的提取算法，具有异构模型。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-12-S1-S16.

A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps.一种使用模糊自组织映射发现 DNA 基序的鲁棒启发式算法。

IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1677-88. doi: 10.1109/TNNLS.2013.2275733.

A cluster refinement algorithm for motif discovery.一种用于发现模体的簇精炼算法。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):654-68. doi: 10.1109/TCBB.2009.25.

Transcription factor binding site identification using the self-organizing map.使用自组织映射识别转录因子结合位点

Bioinformatics. 2005 May 1;21(9):1807-14. doi: 10.1093/bioinformatics/bti256. Epub 2005 Jan 12.

Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach.使用遗传k-中心点方法对转录因子结合基序进行无比对聚类。

BMC Bioinformatics. 2015 Jan 28;16:22. doi: 10.1186/s12859-015-0450-2.

Self-organizing neural networks to support the discovery of DNA-binding motifs.支持发现DNA结合基序的自组织神经网络。

Neural Netw. 2006 Jul-Aug;19(6-7):950-62. doi: 10.1016/j.neunet.2006.05.023. Epub 2006 Jul 12.

Stochastic EM-based TFBS motif discovery with MITSU.基于随机期望最大化的转录因子结合位点基序发现方法 MITSU。

Bioinformatics. 2014 Jun 15;30(12):i310-8. doi: 10.1093/bioinformatics/btu286.

A novel Bayesian DNA motif comparison method for clustering and retrieval.一种用于聚类和检索的新型贝叶斯DNA基序比较方法。

PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.

A transdimensional Bayesian model for pattern recognition in DNA sequences.一种用于DNA序列模式识别的跨维度贝叶斯模型。

Biostatistics. 2008 Oct;9(4):668-85. doi: 10.1093/biostatistics/kxm058. Epub 2008 Mar 18.

Limitations and potentials of current motif discovery algorithms.当前基序发现算法的局限性与潜力。

Nucleic Acids Res. 2005 Sep 2;33(15):4899-913. doi: 10.1093/nar/gki791. Print 2005.

引用本文的文献

MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures.MOST+：一种结合基因组序列和异质全基因组特征的从头基序发现方法。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S13. doi: 10.1186/1471-2164-16-S7-S13. Epub 2015 Jun 11.

A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.一个关于 motif 发现网络工具的调查，用于检测 ChIP-Seq 数据中的结合位点 motif。

Biol Direct. 2014 Feb 20;9:4. doi: 10.1186/1745-6150-9-4.

MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences.MISCORE：一种用于表征启动子序列中DNA调控基序的新评分函数。

BMC Syst Biol. 2012;6 Suppl 2(Suppl 2):S4. doi: 10.1186/1752-0509-6-S2-S4. Epub 2012 Dec 12.

Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. motif 发现和转录因子结合位点在新一代测序时代前后。

Brief Bioinform. 2013 Mar;14(2):225-37. doi: 10.1093/bib/bbs016. Epub 2012 Apr 19.

本文引用的文献

GADEM: a genetic algorithm guided formation of spaced dyads coupled with an EM algorithm for motif discovery.GADEM：一种遗传算法引导的间隔二元组形成，结合期望最大化算法用于基序发现。

J Comput Biol. 2009 Feb;16(2):317-29. doi: 10.1089/cmb.2008.16TT.

A hybrid model for robust detection of transcription factor binding sites.

Bioinformatics. 2008 Feb 15;24(4):484-91. doi: 10.1093/bioinformatics/btm629. Epub 2008 Jan 9.

Motif discoveries in unaligned molecular sequences using self-organizing neural networks.使用自组织神经网络在未比对分子序列中发现基序

IEEE Trans Neural Netw. 2006 Jul;17(4):919-928. doi: 10.1109/TNN.2006.875987.

GAME: detecting cis-regulatory elements using a genetic algorithm.GAME：使用遗传算法检测顺式调控元件

Bioinformatics. 2006 Jul 1;22(13):1577-84. doi: 10.1093/bioinformatics/btl147. Epub 2006 Apr 21.

ABS: a database of Annotated regulatory Binding Sites from orthologous promoters.ABS：来自直系同源启动子的注释调控结合位点数据库。

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D63-7. doi: 10.1093/nar/gkj116.

Limitations and potentials of current motif discovery algorithms.当前基序发现算法的局限性与潜力。

Nucleic Acids Res. 2005 Sep 2;33(15):4899-913. doi: 10.1093/nar/gki791. Print 2005.

Transcription factor binding site identification using the self-organizing map.使用自组织映射识别转录因子结合位点

Bioinformatics. 2005 May 1;21(9):1807-14. doi: 10.1093/bioinformatics/bti256. Epub 2005 Jan 12.

Comparative analysis of methods for representing and searching for transcription factor binding sites.转录因子结合位点的表示与搜索方法的比较分析

Bioinformatics. 2004 Dec 12;20(18):3516-25. doi: 10.1093/bioinformatics/bth438. Epub 2004 Aug 5.

Position specific variation in the rate of evolution in transcription factor binding sites.转录因子结合位点进化速率的位置特异性变异

BMC Evol Biol. 2003 Aug 28;3:19. doi: 10.1186/1471-2148-3-19.

How to make large self-organizing maps for nonvectorial data.如何为非矢量数据制作大型自组织映射图。

Neural Netw. 2002 Oct-Nov;15(8-9):945-52. doi: 10.1016/s0893-6080(02)00069-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。