K.A. Timiryazev Institute of Plant Physiology RAS, 35 Botanicheskaya Str., Moscow 127276, Russia.
Int J Mol Sci. 2024 Feb 5;25(3):1903. doi: 10.3390/ijms25031903.
The concept of cis-regulatory modules located in gene promoters represents today's vision of the organization of gene transcriptional regulation. Such modules are a combination of two or more single, short DNA motifs. The bioinformatic identification of such modules belongs to so-called NP-hard problems with extreme computational complexity, and therefore, simplifications, assumptions, and heuristics are usually deployed to tackle the problem. In practice, this requires, first, many parameters to be set before the search, and second, it leads to the identification of locally optimal results. Here, a novel method is presented, aimed at identifying the cis-regulatory elements in gene promoters based on an exhaustive search of all the feasible modules' configurations. All required parameters are automatically estimated using positive and negative datasets. To be computationally efficient, the search is accelerated using a multidimensional hash function, allowing the search to complete in a few hours on a regular laptop (for example, a CPU Intel i7, 3.2 GH, 32 Gb RAM). Tests on an established benchmark and real data show better performance of BestCRM compared to the available methods according to several metrics like specificity, sensitivity, AUC, etc. A great practical advantage of the method is its minimum number of input parameters-apart from positive and negative promoters, only a desired level of module presence in promoters is required.
位于基因启动子中的顺式调控模块的概念代表了当前对基因转录调控组织方式的认识。这些模块是两个或多个单一的、短的 DNA 基序的组合。这种模块的生物信息学识别属于具有极端计算复杂度的所谓 NP 难问题,因此,通常会采用简化、假设和启发式方法来解决问题。在实践中,这首先需要在搜索之前设置许多参数,其次会导致识别出局部最优结果。这里提出了一种新方法,旨在基于对所有可行模块配置的穷举搜索来识别基因启动子中的顺式调控元件。所有必需的参数都使用阳性和阴性数据集自动估计。为了提高计算效率,使用多维哈希函数加速搜索,使得搜索可以在几个小时内在常规笔记本电脑上完成(例如,CPU Intel i7,3.2 GH,32 Gb RAM)。在已建立的基准测试和真实数据上的测试表明,BestCRM 在特异性、敏感性、AUC 等多个指标上的性能优于现有方法。该方法的一个很大的实际优势是其输入参数的数量最少——除了阳性和阴性启动子之外,只需要在启动子中存在所需水平的模块。