Suppr超能文献

基于自组织映射的 DNA motif 识别的提取算法,具有异构模型。

SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model.

机构信息

Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria 3086, Australia.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-12-S1-S16.

Abstract

BACKGROUND

Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties.

RESULTS

This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools.

CONCLUSIONS

Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.

摘要

背景

从背景序列中区分转录因子结合位点(TFBS)在计算基序发现中起着关键作用。当前基于聚类的算法采用同质模型来解决问题,该模型假设基序和背景信号可以等效地描述。这种假设存在一些局限性,因为这两种序列信号具有不同的特性。

结果

本文旨在开发一种基于自组织映射(SOM)的聚类算法,用于从 DNA 序列中提取结合位点。我们的框架基于一种新颖的节点内软竞争过程,以实现数据集中文本与背景信号的最大区分。节点内竞争基于两种不同信号模型的自适应加权技术,以更好地表示这两类信号。使用几个真实和人工数据集,我们将我们提出的方法与几个基序发现工具进行了比较。与最先进的基于 SOM 的基序发现工具 SOMBRERO 相比,发现在不影响其敏感性的情况下,我们的算法可以在真实数据集上显著提高平均精度(即约 27%)。我们的方法与其他基序发现工具相比也表现出色。

结论

基于模型的聚类框架的基序发现应该考虑使用异构模型来表示 DNA 序列中的两类信号。与同质模型相比,这种异构模型可以实现更好的信号区分。

相似文献

3
A cluster refinement algorithm for motif discovery.一种用于发现模体的簇精炼算法。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):654-68. doi: 10.1109/TCBB.2009.25.
4
Transcription factor binding site identification using the self-organizing map.使用自组织映射识别转录因子结合位点
Bioinformatics. 2005 May 1;21(9):1807-14. doi: 10.1093/bioinformatics/bti256. Epub 2005 Jan 12.
6
Self-organizing neural networks to support the discovery of DNA-binding motifs.支持发现DNA结合基序的自组织神经网络。
Neural Netw. 2006 Jul-Aug;19(6-7):950-62. doi: 10.1016/j.neunet.2006.05.023. Epub 2006 Jul 12.
10
Limitations and potentials of current motif discovery algorithms.当前基序发现算法的局限性与潜力。
Nucleic Acids Res. 2005 Sep 2;33(15):4899-913. doi: 10.1093/nar/gki791. Print 2005.

本文引用的文献

2
A hybrid model for robust detection of transcription factor binding sites.
Bioinformatics. 2008 Feb 15;24(4):484-91. doi: 10.1093/bioinformatics/btm629. Epub 2008 Jan 9.
4
GAME: detecting cis-regulatory elements using a genetic algorithm.GAME:使用遗传算法检测顺式调控元件
Bioinformatics. 2006 Jul 1;22(13):1577-84. doi: 10.1093/bioinformatics/btl147. Epub 2006 Apr 21.
6
Limitations and potentials of current motif discovery algorithms.当前基序发现算法的局限性与潜力。
Nucleic Acids Res. 2005 Sep 2;33(15):4899-913. doi: 10.1093/nar/gki791. Print 2005.
7
Transcription factor binding site identification using the self-organizing map.使用自组织映射识别转录因子结合位点
Bioinformatics. 2005 May 1;21(9):1807-14. doi: 10.1093/bioinformatics/bti256. Epub 2005 Jan 12.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验