Bais Abha Singh, Grossmann Steffen, Vingron Martin
Computational Molecular Biology, Max Planck Institute for Molecular Genetics Ihnestrasse 63-73, D-14195, Berlin, Germany.
Bioinformatics. 2007 Jan 15;23(2):e44-9. doi: 10.1093/bioinformatics/btl305.
Current methods that annotate conserved transcription factor binding sites in an alignment of two regulatory regions perform the alignment and annotation step separately and combine the results in the end. If the site descriptions are weak or the sequence similarity is low, the local gap structure of the alignment poses a problem in detecting the conserved sites. It is therefore desirable to have an approach that is able to simultaneously consider the alignment as well as possibly matching site locations.
With SimAnn we have developed a tool that serves exactly this purpose. By combining the annotation step and the alignment of the two sequences into one algorithm, it detects conserved sites more clearly. It has the additional advantage that all parameters are calculated based on statistical considerations. This allows for its successful application with any binding site model of interest. We present the algorithm and the approach for parameter selection and compare its performance with that of other, non-simultaneous methods on both simulated and real data.
A command-line based C++ implementation of SimAnn is available from the authors upon request. In addition, we provide Perl scripts for calculating the input parameters based on statistical considerations.
当前在两个调控区域比对中注释保守转录因子结合位点的方法,是将比对和注释步骤分开进行,最后再合并结果。如果位点描述不明确或序列相似性较低,比对的局部空位结构在检测保守位点时就会产生问题。因此,需要一种能够同时考虑比对以及可能匹配的位点位置的方法。
我们利用模拟退火算法开发了一个恰好能实现这一目的的工具。通过将注释步骤和两个序列的比对合并为一个算法,它能更清晰地检测保守位点。它还有一个额外的优点,即所有参数都是基于统计考量计算得出的。这使得它能够成功应用于任何感兴趣的结合位点模型。我们展示了该算法以及参数选择方法,并在模拟数据和真实数据上,将其性能与其他非同步方法的性能进行了比较。
作者可应要求提供基于命令行的SimAnn C++实现。此外,我们还提供了基于统计考量计算输入参数的Perl脚本。