基于模型的方法识别功能基因间转录区和非编码 RNA。

A Model-Based Approach for Identifying Functional Intergenic Transcribed Regions and Noncoding RNAs.

机构信息

Department of Plant Biology, Michigan State University, East Lansing, MI.

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI.

出版信息

Mol Biol Evol. 2018 Jun 1;35(6):1422-1436. doi: 10.1093/molbev/msy035.

DOI:10.1093/molbev/msy035

PMID:29554332

Abstract

With advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning models using Arabidopsis thaliana as a model that accurately distinguish functional sequences (benchmark protein-coding and RNA genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.

摘要

随着转录谱分析的进展，基因间区转录活性的存在已得到充分证实。然而，基因间表达是否反映转录噪声或新型基因的活性仍不清楚。我们在 15 种不同的开花植物物种中鉴定了基因间转录区（ITR），发现基因间表达的数量与基因组大小相关，如果基因间表达主要是非功能性的，那么这种模式是可以预期的。为了进一步评估 ITR 的功能，我们首先使用拟南芥作为模型，构建了机器学习模型，该模型通过整合 93 种生化、进化和序列结构特征，准确地区分了功能序列（基准蛋白编码和 RNA 基因）和可能非功能序列（假基因和未表达的基因间区）。接下来，通过在全基因组范围内应用这些模型，我们发现 4427 个 ITR（38%）和 796 个注释 ncRNA（44%）具有与基准蛋白编码或 RNA 基因显著相似的特征，因此可能是功能基因的一部分。大约 60%的 ITR 和 ncRNA 与非功能序列更相似，可能是转录噪声。这里建立的预测框架不仅提供了一个全面的视角，了解功能、基因序列与可能非功能序列的区别，还提供了一种从具有转录活性噪声的基因组区域中区分新基因的新方法。

相似文献

A Model-Based Approach for Identifying Functional Intergenic Transcribed Regions and Noncoding RNAs.基于模型的方法识别功能基因间转录区和非编码 RNA。

Mol Biol Evol. 2018 Jun 1;35(6):1422-1436. doi: 10.1093/molbev/msy035.

Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae.基因间转录区的进化特征表明禾本科植物中罕见的新基因和广泛存在的噪声转录。

Sci Rep. 2019 Aug 20;9(1):12122. doi: 10.1038/s41598-019-47797-y.

Defining Functional Genic Regions in the Human Genome through Integration of Biochemical, Evolutionary, and Genetic Evidence.通过整合生化、进化和遗传证据来定义人类基因组中的功能基因区域。

Mol Biol Evol. 2017 Jul 1;34(7):1788-1798. doi: 10.1093/molbev/msx101.

High transcript abundance, RNA editing, and small RNAs in intergenic regions within the massive mitochondrial genome of the angiosperm Silene noctiflora.被子植物夜花蝇子草巨大线粒体基因组基因间区域的高转录本丰度、RNA编辑和小RNA

BMC Genomics. 2015 Nov 14;16:938. doi: 10.1186/s12864-015-2155-3.

Characteristics and significance of intergenic polyadenylated RNA transcription in Arabidopsis.拟南芥基因间多聚腺苷酸化 RNA 转录的特征和意义。

Plant Physiol. 2013 Jan;161(1):210-24. doi: 10.1104/pp.112.205245. Epub 2012 Nov 6.

A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription.一项对新的哺乳动物非编码RNA的系统性搜索表明，基因间转录的保守性很低。

BMC Genomics. 2005 Aug 5;6:104. doi: 10.1186/1471-2164-6-104.

Evolutionary Origins of Pseudogenes and Their Association with Regulatory Sequences in Plants.假基因的进化起源及其在植物中与调控序列的关系。

Plant Cell. 2019 Mar;31(3):563-578. doi: 10.1105/tpc.18.00601. Epub 2019 Feb 13.

Intergenic and genic sequence lengths have opposite relationships with respect to gene expression.基因间序列长度和基因序列长度在基因表达方面存在相反的关系。

PLoS One. 2008;3(11):e3670. doi: 10.1371/journal.pone.0003670. Epub 2008 Nov 7.

Transcriptome analysis of smut fungi reveals widespread intergenic transcription and conserved antisense transcript expression.黑粉菌的转录组分析揭示了广泛的基因间转录和保守的反义转录本表达。

BMC Genomics. 2017 May 2;18(1):340. doi: 10.1186/s12864-017-3720-8.

A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection.拟南芥基因组基因间隔区中大量新的编码小开放阅读框被转录和/或处于纯化选择之下。

Genome Res. 2007 May;17(5):632-40. doi: 10.1101/gr.5836207. Epub 2007 Mar 29.

引用本文的文献

Unscheduled epigenetic modifications cause genome instability and sterility through aberrant R-loops following starvation.非计划性的表观遗传修饰通过饥饿后异常 R 环导致基因组不稳定和不育。

Nucleic Acids Res. 2023 Jan 11;51(1):84-98. doi: 10.1093/nar/gkac1155.

Identification and differential analysis of noncoding RNAs in response to drought in f. .草莓中响应干旱的非编码RNA的鉴定与差异分析

Front Plant Sci. 2022 Nov 10;13:1040470. doi: 10.3389/fpls.2022.1040470. eCollection 2022.

Genome-Wide ChIP-seq and RNA-seq Analyses of STAT3 Target Genes in TLRs Activated Human Peripheral Blood B Cells.TLR 激活的人外周血 B 细胞中 STAT3 靶基因的全基因组 ChIP-seq 和 RNA-seq 分析。

Front Immunol. 2022 Mar 8;13:821457. doi: 10.3389/fimmu.2022.821457. eCollection 2022.

The -regulatory codes of response to combined heat and drought stress in .植物对高温和干旱复合胁迫响应的调控机制

NAR Genom Bioinform. 2020 Jul 21;2(3):lqaa049. doi: 10.1093/nargab/lqaa049. eCollection 2020 Sep.

Characterization of novel pollen-expressed transcripts reveals their potential roles in pollen heat stress response in Arabidopsis thaliana.鉴定新型花粉表达转录本揭示了它们在拟南芥花粉热应激响应中的潜在作用。

Plant Reprod. 2021 Mar;34(1):61-78. doi: 10.1007/s00497-020-00400-1. Epub 2021 Jan 18.

Expression Partitioning of Duplicate Genes at Single Cell Resolution in Roots.根系中重复基因在单细胞分辨率下的表达分区

Front Genet. 2020 Nov 3;11:596150. doi: 10.3389/fgene.2020.596150. eCollection 2020.

A hierarchical Bayesian mixture model for inferring the expression state of genes in transcriptomes.用于推断转录组中基因表达状态的分层贝叶斯混合模型。

Proc Natl Acad Sci U S A. 2020 Aug 11;117(32):19339-19346. doi: 10.1073/pnas.1919748117. Epub 2020 Jul 24.

Sci Rep. 2019 Aug 20;9(1):12122. doi: 10.1038/s41598-019-47797-y.

Plant Noncoding RNAs: Hidden Players in Development and Stress Responses.植物非编码 RNA：发育和应激响应中的隐匿调控因子。

Annu Rev Cell Dev Biol. 2019 Oct 6;35:407-431. doi: 10.1146/annurev-cellbio-100818-125218. Epub 2019 Aug 12.

Indole-3-acetic acid has long-term effects on long non-coding RNA gene methylation and growth in Populus tomentosa.吲哚-3-乙酸对毛白杨长链非编码 RNA 基因甲基化和生长具有长期影响。

Mol Genet Genomics. 2019 Dec;294(6):1511-1525. doi: 10.1007/s00438-019-01593-5. Epub 2019 Jul 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于模型的方法识别功能基因间转录区和非编码 RNA。

A Model-Based Approach for Identifying Functional Intergenic Transcribed Regions and Noncoding RNAs.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献