Baumbach Jan, Wittkop Tobias, Weile Jochen, Kohl Thomas, Rahmann Sven
Genome Informatics, Faculty of Technology, Bielefeld University, D-33594 Bielefeld, Germany.
J Integr Bioinform. 2008 Aug 25;5(2):91. doi: 10.2390/biecoll-jib-2008-91.
A precise experimental identification of transcription factor binding motifs (TFBMs), accurate to a single base pair, is time-consuming and diffcult. For several databases, TFBM annotations are extracted from the literature and stored 5' --> 3' relative to the target gene. Mixing the two possible orientations of a motif results in poor information content of subsequently computed position frequency matrices (PFMs) and sequence logos. Since these PFMs are used to predict further TFBMs, we address the question if the TFBMs underlying a PFM can be re-annotated automatically to improve both the information content of the PFM and subsequent classification performance.
We present MoRAine, an algorithm that re-annotates transcription factor binding motifs. Each motif with experimental evidence underlying a PFM is compared against each other such motif. The goal is to re-annotate TFBMs by possibly switching their strands and shifting them a few positions in order to maximize the information content of the resulting adjusted PFM. We present two heuristic strategies to perform this optimization and subsequently show that MoRAine significantly improves the corresponding sequence logos. Furthermore, we justify the method by evaluating specificity, sensitivity, true positive, and false positive rates of PFM-based TFBM predictions for E. coli using the original database motifs and the MoRAine-adjusted motifs. The classification performance is considerably increased if MoRAine is used as a preprocessing step.
MoRAine is integrated into a publicly available web server and can be used online or downloaded as a stand-alone version from http://moraine.cebitec. uni-bielefeld.de.
精确到单个碱基对的转录因子结合基序(TFBM)的实验鉴定既耗时又困难。对于几个数据库,TFBM注释是从文献中提取的,并相对于靶基因按5'→3'方向存储。混合基序的两种可能方向会导致随后计算的位置频率矩阵(PFM)和序列图谱的信息含量较差。由于这些PFM用于预测更多的TFBM,我们提出一个问题,即能否自动重新注释PFM基础上的TFBM,以提高PFM的信息含量和后续分类性能。
我们提出了MoRAine算法,该算法可对转录因子结合基序进行重新注释。将每个有实验证据支持的PFM基序与其他此类基序进行比较。目的是通过可能地切换链和将它们移动几个位置来重新注释TFBM,以最大化所得调整后PFM的信息含量。我们提出了两种启发式策略来执行此优化,随后表明MoRAine显著改善了相应的序列图谱。此外,我们通过评估使用原始数据库基序和MoRAine调整后的基序对大肠杆菌基于PFM的TFBM预测的特异性、敏感性、真阳性率和假阳性率来证明该方法的合理性。如果将MoRAine用作预处理步骤,分类性能会显著提高。
MoRAine已集成到一个公开可用的网络服务器中,可以在线使用,也可以从http://moraine.cebitec.uni-bielefeld.de下载独立版本。