一种利用 CIS 调控元件模式识别 DNA 增强子区域的机器学习技术。

A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns.

机构信息

Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.

Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia.

出版信息

Sci Rep. 2022 Sep 7;12(1):15183. doi: 10.1038/s41598-022-19099-3.

DOI:10.1038/s41598-022-19099-3

PMID:36071071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9452539/

Abstract

Enhancers regulate gene expression, by playing a crucial role in the synthesis of RNAs and proteins. They do not directly encode proteins or RNA molecules. In order to control gene expression, it is important to predict enhancers and their potency. Given their distance from the target gene, lack of common motifs, and tissue/cell specificity, enhancer regions are thought to be difficult to predict in DNA sequences. Recently, a number of bioinformatics tools were created to distinguish enhancers from other regulatory components and to pinpoint their advantages. However, because the quality of its prediction method needs to be improved, its practical application value must also be improved. Based on nucleotide composition and statistical moment-based features, the current study suggests a novel method for identifying enhancers and non-enhancers and evaluating their strength. The proposed study outperformed state-of-the-art techniques using fivefold and tenfold cross-validation in terms of accuracy. The accuracy from the current study results in 86.5% and 72.3% in enhancer site and its strength prediction respectively. The results of the suggested methodology point to the potential for more efficient and successful outcomes when statistical moment-based features are used. The current study's source code is available to the research community at https://github.com/csbioinfopk/enpred .

摘要

增强子调节基因表达，在 RNA 和蛋白质的合成中起着至关重要的作用。它们不直接编码蛋白质或 RNA 分子。为了控制基因表达，预测增强子及其活性非常重要。由于它们与靶基因的距离、缺乏共同基序以及组织/细胞特异性，因此增强子区域被认为难以在 DNA 序列中预测。最近，已经开发了许多生物信息学工具来区分增强子和其他调控成分，并确定它们的优势。然而，由于其预测方法的质量需要提高，因此其实际应用价值也必须提高。基于核苷酸组成和基于统计矩的特征，本研究提出了一种新的方法来识别增强子和非增强子，并评估它们的强度。在五重和十倍交叉验证方面，该研究在准确性方面优于最先进的技术。当前研究的结果在增强子位点及其强度预测方面分别达到了 86.5%和 72.3%的准确率。所提出方法的结果表明，当使用基于统计矩的特征时，可能会获得更高效和成功的结果。本研究的源代码可在 https://github.com/csbioinfopk/enpred 上获得，供研究社区使用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种利用 CIS 调控元件模式识别 DNA 增强子区域的机器学习技术。

A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

一种利用 CIS 调控元件模式识别 DNA 增强子区域的机器学习技术。

A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献