Suppr超能文献

利用机器学习方法从基因组特征预测水稻对高温和干旱胁迫的转录反应。

Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice.

作者信息

Smet Dajo, Opdebeeck Helder, Vandepoele Klaas

机构信息

Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.

Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie (VIB), Ghent, Belgium.

出版信息

Front Plant Sci. 2023 Jul 17;14:1212073. doi: 10.3389/fpls.2023.1212073. eCollection 2023.

Abstract

Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing key components of cis-regulatory elements and regulatory networks. Our understanding of the underlying regulatory code remains, however, incomplete. Recent studies have shown that, by training machine learning (ML) algorithms on genomic sequence features, it is possible to predict which genes will transcriptionally respond to a specific stress. By identifying the most important features for gene expression prediction, these trained ML models allow, in theory, to further elucidate the regulatory code underlying the transcriptional response to abiotic stress. Here, we trained random forest ML models to predict gene expression in rice () in response to heat or drought stress. Apart from thoroughly assessing model performance and robustness across various input training data, the importance of promoter and gene body sequence features to train ML models was evaluated. The use of enriched promoter oligomers, complementing known TF binding sites, allowed us to gain novel insights in DNA motifs contributing to the stress regulatory code. By comparing genomic feature importance scores for drought and heat stress over time, general and stress-specific genomic features contributing to the performance of the learned models and their temporal variation were identified. This study provides a solid foundation to build and interpret ML models accurately predicting transcriptional responses and enables novel insights in biological sequence features that are important for abiotic stress responses.

摘要

植物已经进化出各种机制来适应不利的环境胁迫,例如基因表达的调控。胁迫响应基因的表达由特定的调节因子控制,包括转录因子(TFs),它们与序列特异性结合位点结合,这些位点代表顺式调控元件和调控网络的关键组成部分。然而,我们对潜在调控密码的理解仍然不完整。最近的研究表明,通过在基因组序列特征上训练机器学习(ML)算法,可以预测哪些基因将对特定胁迫产生转录响应。通过识别基因表达预测中最重要的特征,这些经过训练的ML模型理论上可以进一步阐明非生物胁迫转录响应背后的调控密码。在这里,我们训练了随机森林ML模型来预测水稻在热胁迫或干旱胁迫下的基因表达。除了全面评估模型在各种输入训练数据上的性能和稳健性外,还评估了启动子和基因体序列特征对训练ML模型的重要性。使用富集的启动子寡聚体,补充已知的TF结合位点,使我们能够对有助于胁迫调控密码的DNA基序有新的认识。通过比较干旱和热胁迫随时间的基因组特征重要性得分,确定了有助于学习模型性能及其时间变化的一般和胁迫特异性基因组特征。这项研究为构建和解释准确预测转录响应的ML模型提供了坚实的基础,并能够对非生物胁迫响应重要的生物序列特征有新的认识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbeb/10390317/faf23ea29711/fpls-14-1212073-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验