Suppr超能文献

一个成功的混合深度学习模型,旨在进行启动子识别。

A successful hybrid deep learning model aiming at promoter identification.

机构信息

Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, China.

出版信息

BMC Bioinformatics. 2022 May 31;23(Suppl 1):206. doi: 10.1186/s12859-022-04735-6.

Abstract

BACKGROUND

The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling genomic regulation. A number of methodologies for the identification of promoters have been proposed. Nonetheless, due to the great heterogeneity existing in promoters, the results of these procedures are still unsatisfactory. In order to establish additional discriminative characteristics and properly recognize promoters, we developed the hybrid model for promoter identification (HMPI), a hybrid deep learning model that can characterize both the native sequences of promoters and the morphological outline of promoters at the same time. We developed the HMPI to combine a method called the PSFN (promoter sequence features network), which characterizes native promoter sequences and deduces sequence features, with a technique referred to as the DSPN (deep structural profiles network), which is specially structured to model the promoters in terms of their structural profile and to deduce their structural attributes.

RESULTS

The HMPI was applied to human, plant and Escherichia coli K-12 strain datasets, and the findings showed that the HMPI was successful at extracting the features of the promoter while greatly enhancing the promoter identification performance. In addition, after the improvements of synthetic sampling, transfer learning and label smoothing regularization, the improved HMPI models achieved good results in identifying subtypes of promoters on prokaryotic promoter datasets.

CONCLUSIONS

The results showed that the HMPI was successful at extracting the features of promoters while greatly enhancing the performance of identifying promoters on both eukaryotic and prokaryotic datasets, and the improved HMPI models are good at identifying subtypes of promoters on prokaryotic promoter datasets. The HMPI is additionally adaptable to different biological functional sequences, allowing for the addition of new features or models.

摘要

背景

转录起始位点(TSS)附近的区域,即启动子,主要参与 DNA 转录起始和调控过程。因此,正确识别启动子对于进一步理解控制基因组调控的网络机制至关重要。已经提出了许多用于识别启动子的方法。然而,由于启动子存在很大的异质性,这些方法的结果仍然不尽如人意。为了建立额外的有区分性的特征并正确识别启动子,我们开发了启动子识别的混合模型(HMPI),这是一种混合深度学习模型,可以同时描述启动子的固有序列和启动子的形态轮廓。我们开发了 HMPI,将一种称为 PSFN(启动子序列特征网络)的方法与一种称为 DSPN(深度结构轮廓网络)的技术相结合,PSFN 用于描述固有启动子序列并推导出序列特征,而 DSPN 则专门用于根据结构轮廓对启动子进行建模并推导出它们的结构属性。

结果

HMPI 应用于人类、植物和大肠杆菌 K-12 菌株数据集,结果表明,HMPI 成功地提取了启动子的特征,同时大大提高了启动子识别性能。此外,经过合成采样、迁移学习和标签平滑正则化的改进,改进后的 HMPI 模型在识别原核启动子数据集的启动子亚型方面取得了良好的效果。

结论

结果表明,HMPI 成功地提取了启动子的特征,同时大大提高了真核和原核数据集上启动子识别的性能,改进后的 HMPI 模型在识别原核启动子数据集的启动子亚型方面表现良好。HMPI 还适应不同的生物功能序列,允许添加新的特征或模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15bd/9158169/5a1c24374113/12859_2022_4735_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验