Suppr超能文献

可解释人工智能作为古菌启动子区域的可靠注释器。

Explainable artificial intelligence as a reliable annotator of archaeal promoter regions.

机构信息

Programa de Pós-Graduação em Biotecnologia, Universidade de Caxias do Sul, Caxias do Sul, RS, Brazil.

Unidad Académica de Yucatán, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Yucatán, Mérida, Mexico.

出版信息

Sci Rep. 2023 Jan 31;13(1):1763. doi: 10.1038/s41598-023-28571-7.

Abstract

Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.

摘要

古菌是一个广阔而未被探索的细胞领域,它们在多样化的环境中茁壮成长,在介导全球碳和养分通量的过程中发挥着核心作用。为了使这些生物体平衡其新陈代谢,适当调节其基因表达是必不可少的。调节负责古菌生命维持的基因的一个关键动力是转录因子蛋白与启动子元件结合。这个 DNA 片段是保守的,这使其能够通过机器学习技术进行探索。在这里,我们使用 3935 个已知古菌启动子序列训练和测试了支持向量机。所有启动子序列都被编码为 DNA 双链体稳定性。之后,我们执行了一个模型解释任务,以映射分类过程的决策模式。我们还使用了一个已知启动子序列数据集进行验证。我们的结果表明,在分析的生物体中,位置-27 上游(相对于起始 TSS)的富含 AT 的区域是最保守的。此外,我们能够识别 BRE 元件(-33)、PPE(在-10)和+3 位置,这为了解启动子在所有古菌生物体中的组织方式提供了更易于理解的图景。最后,我们使用解释后的模型来识别 135 个未注释的生物体的潜在启动子序列,提供了以前从未完成过的规模的古菌调控区域注释(https://pcyt.unam.mx/gene-regulation/)。我们认为,这种方法将有助于了解除已经建立的转录因子结合位点之外,基因调控在其他生物体中是如何实现的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed24/9889792/971a551d0a60/41598_2023_28571_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验