King Saud University, College of Computer and Information Sciences, Riyadh, Kingdom of Saudi Arabia.
BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S4. doi: 10.1186/1471-2105-14-S9-S4. Epub 2013 Jun 28.
Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case.
In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery.
Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures.
We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
模体发现是在生物数据中寻找重复模式的问题。模式可以是顺序的,主要是在 DNA 序列中发现时。它们也可以是结构的(例如,在发现 RNA 模体时)。发现常见的结构模式有助于更好地理解作用机制(例如,转录后调控)。与顺序保守的 DNA 模体不同,RNA 模体在结构上表现出保守性,即使序列不同,也可能具有共同性。在过去的几年中,已经开发了数百种算法来解决顺序模体发现问题,而对于结构情况则较少。
在本文中,我们调查、分类和比较了解决结构模体发现问题的不同算法,其中基础序列可能不同。我们强调了它们的优缺点。我们首先提出了一个基准数据集和一个可用于评估不同模体发现方法的度量工具。然后,我们提出了我们的实验设置。最后,使用提出的基准来比较可用工具来获得结果。据我们所知,这是第一次尝试仅比较专门用于结构模体发现的工具。
结果表明,发现的模体的准确性相对较低。结果还表明,工具之间存在互补行为,一些工具在简单结构上表现良好,而其他工具则更适合复杂结构。
我们已经对可用的结构模体发现工具进行了分类和评估。此外,我们还提出了一个带有工具的基准数据集,可用于评估新开发的工具。