Suppr超能文献

利用减少间隙二肽组成鉴定偏好结合甲基化DNA的转录因子

Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced -Gap Dipeptide Composition.

作者信息

Nguyen Quang H, Tran Hoang V, Nguyen Binh P, Do Trang T T

机构信息

School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam.

School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand.

出版信息

ACS Omega. 2022 Aug 30;7(36):32322-32330. doi: 10.1021/acsomega.2c03696. eCollection 2022 Sep 13.

Abstract

Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced -gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems.

摘要

转录因子(TFs)在基因表达和三维基因组构象调控中发挥着重要作用。转录因子能够与被称为增强子和启动子的特定DNA片段结合。一些转录因子与靠近转录起始位点的启动子DNA片段结合,形成允许聚合酶结合以启动转录的复合物。先前的研究表明,甲基化DNA具有抑制和阻止转录因子与DNA片段结合的能力。然而,最近的研究发现存在能够与甲基化DNA片段结合的转录因子。识别这些转录因子是更好地理解细胞基因表达机制的重要基石。然而,由于实验方法通常既耗时又费力,因此开发计算方法至关重要。在本研究中,我们针对两个问题提出了两种机器学习方法:(1)识别转录因子,(2)识别偏好与甲基化DNA靶点结合的转录因子(TFPMs)。对于转录因子识别问题,所提出的方法使用位置特异性评分矩阵进行数据表示,并使用深度卷积神经网络进行建模。该方法在独立测试集上实现了90.56%的灵敏度、83.96%的特异性以及受试者工作特征曲线下面积(AUC)为0.9596。对于TFPM识别问题,我们建议使用减少间隙二肽组成进行数据表示,并使用支持向量机算法进行建模。该方法在另一个独立测试集上实现了82.61%的灵敏度、64.86%的特异性以及AUC为0.8486。这些结果高于针对相同问题的其他研究结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be7/9475634/1da6799236de/ao2c03696_0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验