Suppr超能文献

ncPro-ML:一种用于识别多种物种中非编码RNA启动子的综合计算工具。

ncPro-ML: An integrated computational tool for identifying non-coding RNA promoters in multiple species.

作者信息

Tang Qiang, Nie Fulei, Kang Juanjuan, Chen Wei

机构信息

Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.

Center for Genomics and Computational Biology, Scholl of Life Sciences, North China University of Science and Technology, Tangshan 063210, China.

出版信息

Comput Struct Biotechnol J. 2020 Sep 10;18:2445-2452. doi: 10.1016/j.csbj.2020.09.001. eCollection 2020.

Abstract

The promoter is located near the transcription start sites and regulates transcription initiation of the gene. Accurate identification of promoters is essential for understanding the mechanism of gene regulation. Since experimental methods are costly and ineffective, developing efficient and accurate computational tools to identify promoters are necessary. Although a series of methods have been proposed for identifying promoters, none of them is able to identify the promoters of non-coding RNA (ncRNA). In the present work, a new method called ncPro-ML was proposed to identify the promoter of ncRNA in and , in which different kinds of sequence encoding schemes were used to convert DNA sequences into feature vectors. To test the length effect, for each species, datasets including sequences with different lengths were built. The results demonstrated that ncPro-ML achieved the best performance based on the dataset with the sequence length of 221 nucleotides for human and mouse. The performances of ncPro-ML were also satisfying from both independent dataset test and cross-species test. The results indicate that the proposed predictor can server as a powerful tool for the discovery of ncRNA promoters. In addition, a web-server for ncPro-ML was developed, which can be freely accessed at http://www.bio-bigdata.cn/ncPro-ML/.

摘要

启动子位于转录起始位点附近,调控基因的转录起始。准确识别启动子对于理解基因调控机制至关重要。由于实验方法成本高且效率低,因此开发高效准确的计算工具来识别启动子很有必要。尽管已经提出了一系列识别启动子的方法,但没有一种方法能够识别非编码RNA(ncRNA)的启动子。在本研究中,提出了一种名为ncPro-ML的新方法来识别[具体物种1]和[具体物种2]中的ncRNA启动子,该方法使用了不同类型的序列编码方案将DNA序列转换为特征向量。为了测试长度效应,针对每个物种构建了包含不同长度序列的数据集。结果表明,基于人类和小鼠221个核苷酸序列长度的数据集,ncPro-ML取得了最佳性能。ncPro-ML在独立数据集测试和跨物种测试中的性能也令人满意。结果表明,所提出的预测器可作为发现ncRNA启动子的强大工具。此外,还开发了一个用于ncPro-ML的网络服务器,可通过http://www.bio-bigdata.cn/ncPro-ML/免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce92/7509369/f167d95205d4/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验