Suppr超能文献

基因组中潜在启动子序列数据库。

Database of Potential Promoter Sequences in the Genome.

作者信息

Rudenko Valentina, Korotkov Eugene

机构信息

Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia.

出版信息

Biology (Basel). 2022 Jul 26;11(8):1117. doi: 10.3390/biology11081117.

Abstract

In this study, we used a mathematical method for the multiple alignment of highly divergent sequences (MAHDS) to create a database of potential promoter sequences (PPSs) in the genome. To search for PPSs, 20 statistically significant classes of sequences located in the range from -499 to +100 nucleotides near the annotated genes were calculated. For each class, a position-weight matrix (PWM) was computed and then used to identify PPSs in the genome. In total, 825,136 PPSs were detected, with a false positive rate of 0.13%. The PPSs obtained with the MAHDS method were tested using TSSFinder, which detects transcription start sites. The databank of the found PPSs provides their coordinates in chromosomes, the alignment of each PPS with the PWM, and the level of statistical significance as a normal distribution argument, and can be used in genetic engineering and biotechnology.

摘要

在本研究中,我们使用了一种用于高度分化序列多重比对的数学方法(MAHDS)来创建基因组中潜在启动子序列(PPS)的数据库。为了搜索PPS,计算了位于注释基因附近-499至+100个核苷酸范围内的20类具有统计学意义的序列。对于每一类,计算了位置权重矩阵(PWM),然后用于识别基因组中的PPS。总共检测到825,136个PPS,假阳性率为0.13%。使用检测转录起始位点的TSSFinder对通过MAHDS方法获得的PPS进行了测试。所发现的PPS数据库提供了它们在染色体中的坐标、每个PPS与PWM的比对以及作为正态分布参数的统计显著性水平,可用于基因工程和生物技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d30/9332048/4ac2d3c5624b/biology-11-01117-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验