Mahony Shaun, Benos Panayiotis V, Smith Terry J, Golden Aaron
Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
Neural Netw. 2006 Jul-Aug;19(6-7):950-62. doi: 10.1016/j.neunet.2006.05.023. Epub 2006 Jul 12.
Identification of the short DNA sequence motifs that serve as binding targets for transcription factors is an important challenge in bioinformatics. Unsupervised techniques from the statistical learning theory literature have often been applied to motif discovery, but effective solutions for large genomic datasets have yet to be found. We present here three self-organizing neural networks that have applicability to the motif-finding problem. The core system in this study is a previously described SOM-based motif-finder named SOMBRERO. The motif-finder is integrated in this work with a SOM-based method that automatically constructs generalized models for structurally related motifs and initializes SOMBRERO with relevant biological knowledge. A self-organizing tree method that displays the relationships between various motifs is also presented, and it is shown that such a method can act as an effective structural classifier of novel motifs. The performance of the three self-organizing neural networks is evaluated here using various datasets.
识别作为转录因子结合靶点的短DNA序列基序是生物信息学中的一项重要挑战。统计学习理论文献中的无监督技术经常被应用于基序发现,但尚未找到适用于大型基因组数据集的有效解决方案。我们在此展示了三种适用于基序寻找问题的自组织神经网络。本研究的核心系统是一个先前描述的基于自组织映射(SOM)的基序寻找器,名为SOMBRERO。在这项工作中,该基序寻找器与一种基于SOM的方法相结合,该方法能自动构建结构相关基序的广义模型,并利用相关生物学知识初始化SOMBRERO。还展示了一种能显示各种基序之间关系的自组织树方法,结果表明这种方法可作为新型基序的有效结构分类器。本文使用各种数据集评估了这三种自组织神经网络的性能。