Kato Mamoru, Tsunoda Tatsuhiko
Laboratory for Medical Informatics, SNP Research Center, RIKEN, Yokohama, Japan.
BMC Bioinformatics. 2007 Mar 22;8:100. doi: 10.1186/1471-2105-8-100.
A combination of multiple types of transcription factors and cis-regulatory elements is often required for gene expression in eukaryotes, and the combinatorial regulation confers specific gene expression to tissues or environments. To reveal the combinatorial regulation, computational methods are developed that efficiently infer combinations of cis-regulatory motifs that are important for gene expression as measured by DNA microarrays. One promising type of computational method is to utilize regression analysis between expression levels and scores of motifs in input sequences. This type takes full advantage of information on expression levels because it does not require that the expression level of each gene be dichotomized according to whether or not it reaches a certain threshold level. However, there is no web-based tool that employs regression methods to systematically search for motif combinations and that practically handles combinations of more than two or three motifs.
We here introduced MotifCombinator, an online tool with a user-friendly interface, to systematically search for combinations composed of any number of motifs based on regression methods. The tool utilizes well-known regression methods (the multivariate linear regression, the multivariate adaptive regression spline or MARS, and the multivariate logistic regression method) for this purpose, and uses the genetic algorithm to search for combinations composed of any desired number of motifs. The visualization systems in this tool help users to intuitively grasp the process of the combination search, and the backup system allows users to easily stop and restart calculations that are expected to require large computational time. This tool also provides preparatory steps needed for systematic combination search--i.e., selecting single motifs to constitute combinations and cutting out redundant similar motifs based on clustering analysis.
MotifCombinator helps users to systematically search for motif combinations that play an important role in gene expression as measured by microarrays.
真核生物中的基因表达通常需要多种类型的转录因子和顺式调控元件的组合,这种组合调控赋予组织或环境特定的基因表达。为了揭示这种组合调控,人们开发了计算方法,这些方法能够有效地推断出对于基因表达至关重要的顺式调控基序组合,基因表达通过DNA微阵列进行测量。一种很有前景的计算方法类型是利用输入序列中基序得分与表达水平之间的回归分析。这种类型充分利用了表达水平的信息,因为它不需要根据每个基因的表达水平是否达到某个阈值水平将其二分。然而,目前还没有基于网络的工具利用回归方法系统地搜索基序组合,并且实际处理超过两三个基序的组合。
我们在此引入了MotifCombinator,这是一个具有用户友好界面的在线工具,用于基于回归方法系统地搜索由任意数量的基序组成的组合。该工具为此目的使用了著名的回归方法(多元线性回归、多元自适应回归样条或MARS以及多元逻辑回归方法),并使用遗传算法搜索由任意所需数量的基序组成的组合。该工具中的可视化系统帮助用户直观地掌握组合搜索过程,备份系统允许用户轻松停止和重新启动预计需要大量计算时间的计算。该工具还提供了系统组合搜索所需的预备步骤——即选择构成组合的单个基序,并基于聚类分析剔除冗余的相似基序。
MotifCombinator帮助用户系统地搜索在微阵列测量的基因表达中起重要作用的基序组合。