Quandt K, Frech K, Karas H, Wingender E, Werner T
Institut für Säugetiergenetik, GSF-Forschungszentrum für Umwelt und Gesundheit GmbH, Neuherberg, Germany.
Nucleic Acids Res. 1995 Dec 11;23(23):4878-84. doi: 10.1093/nar/23.23.4878.
The identification of potential regulatory motifs in new sequence data is increasingly important for experimental design. Those motifs are commonly located by matches to IUPAC strings derived from consensus sequences. Although this method is simple and widely used, a major drawback of IUPAC strings is that they necessarily remove much of the information originally present in the set of sequences. Nucleotide distribution matrices retain most of the information and are thus better suited to evaluate new potential sites. However, sufficiently large libraries of pre-compiled matrices are a prerequisite for practical application of any matrix-based approach and are just beginning to emerge. Here we present a set of tools for molecular biologists that allows generation of new matrices and detection of potential sequence matches by automatic searches with a library of pre-compiled matrices. We also supply a large library (> 200) of transcription factor binding site matrices that has been compiled on the basis of published matrices as well as entries from the TRANSFAC database, with emphasis on sequences with experimentally verified binding capacity. Our search method includes position weighting of the matrices based on the information content of individual positions and calculates a relative matrix similarity. We show several examples suggesting that this matrix similarity is useful in estimating the functional potential of matrix matches and thus provides a valuable basis for designing appropriate experiments.
在新的序列数据中识别潜在的调控基序对于实验设计越来越重要。这些基序通常通过与源自共有序列的国际纯粹与应用化学联合会(IUPAC)字符串匹配来定位。尽管这种方法简单且被广泛使用,但IUPAC字符串的一个主要缺点是它们必然会去除序列集中原本存在的许多信息。核苷酸分布矩阵保留了大部分信息,因此更适合评估新的潜在位点。然而,足够大的预编译矩阵库是任何基于矩阵的方法实际应用的先决条件,并且才刚刚开始出现。在这里,我们为分子生物学家提供了一套工具,该工具允许生成新的矩阵,并通过使用预编译矩阵库进行自动搜索来检测潜在的序列匹配。我们还提供了一个大型库(> 200个)的转录因子结合位点矩阵,该矩阵是在已发表的矩阵以及TRANSFAC数据库条目的基础上编译而成的,重点是具有经实验验证的结合能力的序列。我们的搜索方法包括基于各个位置的信息含量对矩阵进行位置加权,并计算相对矩阵相似度。我们展示了几个例子,表明这种矩阵相似度在估计矩阵匹配的功能潜力方面很有用,从而为设计适当的实验提供了有价值的基础。