Suppr超能文献

LOGO小麦:基于深度学习预测小麦中非编码变异的调控效应

LOGOWheat: deep learning-based prediction of regulatory effects for noncoding variants in wheats.

作者信息

Kong Lingpeng, Cheng Hong, Zhu Kun, Song Bo

机构信息

Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518124, China.

State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, No. 379 Mingli Road (North Section), Zhengzhou 450046, China.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae705.

Abstract

Identifying the regulatory effects of noncoding variants presents a significant challenge. Recently, the accumulation of epigenomic profiling data in wheat has provided an opportunity to model the functional impacts of these variants. In this study, we introduce Language of Genome for Wheat (LOGOWheat), a deep learning-based tool designed to predict the regulatory effects of noncoding variants in wheat. LOGOWheat initially employs a self-attention-based, contextualized pretrained language model to acquire bidirectional representations of the unlabeled wheat reference genome. Epigenomic profiling data are also collected and utilized to fine-tune the model, enabling it to discern the regulatory code inherent in genomic sequences. The test results suggest that LOGOWheat is highly effective in predicting multiple chromatin features, achieving an average area under the receiver operating characteristic (AUROC) of 0.8531 and an average area under the precision-recall curve (AUPRC) of 0.7633. Two case studies illustrate and demonstrate the main functions provided by LOGOWheat: assigning scores and prioritizing causal variants within a given variant set and constructing a saturated mutagenesis map in silico to discover high-impact sites or functional motifs in a given sequence. Finally, we propose the concept of extracting potential functional variations from the wheat population by integrating evolutionary conservation information. LOGOWheat is available at http://logowheat.cn/.

摘要

识别非编码变异的调控效应是一项重大挑战。最近,小麦表观基因组图谱数据的积累为模拟这些变异的功能影响提供了契机。在本研究中,我们引入了小麦基因组语言模型(LOGOWheat),这是一种基于深度学习的工具,旨在预测小麦中非编码变异的调控效应。LOGOWheat最初采用基于自注意力的上下文预训练语言模型来获取未标记小麦参考基因组的双向表示。还收集并利用表观基因组图谱数据对模型进行微调,使其能够识别基因组序列中固有的调控密码。测试结果表明,LOGOWheat在预测多种染色质特征方面非常有效,在受试者工作特征曲线下面积(AUROC)的平均值为0.8531,精确召回率曲线下面积(AUPRC)的平均值为0.7633。两个案例研究说明了并展示了LOGOWheat提供的主要功能:在给定变异集中分配分数并对因果变异进行优先级排序,以及在计算机上构建饱和诱变图谱以发现给定序列中的高影响位点或功能基序。最后,我们提出了通过整合进化保守信息从小麦群体中提取潜在功能变异的概念。LOGOWheat可在http://logowheat.cn/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f66d/11717721/c9de467380c3/bbae705f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验