通过链特异性直接RNA测序、RNA测序和ESTs相结合，改进3'非翻译区和复杂基因座的注释。

Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.

作者信息

Schurch Nicholas J, Cole Christian, Sherstnev Alexander, Song Junfang, Duc Céline, Storey Kate G, McLean W H Irwin, Brown Sara J, Simpson Gordon G, Barton Geoffrey J

机构信息

Division of Computational Biology, University of Dundee, Dundee, United Kingdom; Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, United Kingdom; Centre for Gene Regulation and Expression, University of Dundee, Dundee, United Kingdom.

Division of Computational Biology, University of Dundee, Dundee, United Kingdom.

出版信息

PLoS One. 2014 Apr 10;9(4):e94270. doi: 10.1371/journal.pone.0094270. eCollection 2014.

DOI:10.1371/journal.pone.0094270

PMID:24722185

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3983147/

Abstract

The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.

摘要

为基因组序列所做的参考注释为该基因组随后的所有分析提供了框架。除了基础的基因组序列外，正确且完整的注释在解释RNA测序实验结果时尤为重要，在这类实验中，短序列 reads 会与基因组进行比对，并根据注释被分配到各个基因上。参考注释与实验系统之间的不一致可能会导致对所研究系统中实验处理或突变对RNA表达的影响产生错误解读。直到最近，3'非翻译区的全基因组注释受到的关注都少于编码区以及内含子/外显子边界的划定。在本文中，由Helicos Biosciences公司的新型单分子、链特异性直接RNA测序技术所产生的数据（该技术可将3'聚腺苷酸化位点定位在正负2个核苷酸范围内），与存档的EST和RNA测序数据相结合，这些数据来自人类、鸡和拟南芥的样本。文中展示了九个例子，说明这种数据组合能够实现：（1）基因和3'UTR的重新注释（包括将一个3'UTR延长5.9 kb）；（2）解析复杂区域中的基因表达；（3）更清晰地解读小RNA表达；以及（4）鉴定新基因。虽然随着基因组序列及其注释的完善，这里展示的具体例子可能会过时，但本文阐述的原则对于那些注释基因组的人以及那些试图在自己的实验数据背景下解读现有公开可用注释的人都将具有普遍用途。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5cf/3983147/04bdfbc1a4e0/pone.0094270.g001.jpg

相似文献

Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.通过链特异性直接RNA测序、RNA测序和ESTs相结合，改进3'非翻译区和复杂基因座的注释。

PLoS One. 2014 Apr 10;9(4):e94270. doi: 10.1371/journal.pone.0094270. eCollection 2014.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Genome annotation with long RNA reads reveals new patterns of gene expression and improves single-cell analyses in an ant brain.利用长 RNA reads 进行基因组注释揭示了一种新的基因表达模式，并提高了蚂蚁大脑的单细胞分析。

BMC Biol. 2021 Nov 27;19(1):254. doi: 10.1186/s12915-021-01188-w.

GASS: genome structural annotation for Eukaryotes based on species similarity.GASS：基于物种相似性的真核生物基因组结构注释

BMC Genomics. 2015 Mar 4;16(1):150. doi: 10.1186/s12864-015-1353-3.

Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing.利用大规模平行焦磷酸测序对拟南芥转录组进行采样。

Plant Physiol. 2007 May;144(1):32-42. doi: 10.1104/pp.107.096677. Epub 2007 Mar 9.

New gene models and alternative splicing in the maize pathogen Colletotrichum graminicola revealed by RNA-Seq analysis.通过RNA测序分析揭示玉米病原菌禾谷炭疽菌中的新基因模型和可变剪接

BMC Genomics. 2014 Oct 2;15(1):842. doi: 10.1186/1471-2164-15-842.

Assessing the impact of human genome annotation choice on RNA-seq expression estimates.评估人类基因组注释选择对 RNA-seq 表达估计的影响。

BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-14-S11-S8. Epub 2013 Nov 4.

APA-Scan: detection and visualization of 3'-UTR alternative polyadenylation with RNA-seq and 3'-end-seq data.APA-Scan：利用 RNA-seq 和 3'-端测序数据检测和可视化 3'-UTR 可变多聚腺苷酸化

BMC Bioinformatics. 2022 Sep 28;23(Suppl 3):396. doi: 10.1186/s12859-022-04939-w.

An improved zebrafish transcriptome annotation for sensitive and comprehensive detection of cell type-specific genes.改进的斑马鱼转录组注释，用于敏感和全面检测细胞类型特异性基因。

Elife. 2020 Aug 24;9:e55792. doi: 10.7554/eLife.55792.

Improving eukaryotic genome annotation using single molecule mRNA sequencing.利用单分子 mRNA 测序提高真核基因组注释。

BMC Genomics. 2018 Mar 1;19(1):172. doi: 10.1186/s12864-018-4555-7.

引用本文的文献

Cis- and trans-action of the cold-induced lncRNAs, SVALKA and SVALNA, regulate CBF1 and CBF3 in Arabidopsis.冷诱导长链非编码RNA SVALKA和SVALNA的顺式和反式作用调控拟南芥中的CBF1和CBF3。

EMBO Rep. 2025 Sep 1. doi: 10.1038/s44319-025-00568-5.

Antisense transcription from stress-responsive transcription factors fine-tunes the cold response in Arabidopsis.应激反应转录因子的反义转录精细调控拟南芥的冷响应。

Plant Cell. 2024 Sep 3;36(9):3467-3482. doi: 10.1093/plcell/koae160.

Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology.通过转录组学和表观转录组学数据集的整合，增强牛基因组注释，从而促进基因组生物学研究。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae019.

Advancing omics data: bridging the gap with .推进组学数据：弥合差距。

Philos Trans R Soc Lond B Biol Sci. 2024 Jan 15;379(1894):20220437. doi: 10.1098/rstb.2022.0437. Epub 2023 Nov 27.

The non-coding RNA SVALKA locus produces a cis-natural antisense transcript that negatively regulates the expression of CBF1 and biomass production at normal temperatures.非编码 RNA SVALKA 基因座产生一个顺式天然反义转录本，该转录本在常温下负调控 CBF1 的表达和生物量的产生。

Plant Commun. 2023 Jul 10;4(4):100551. doi: 10.1016/j.xplc.2023.100551. Epub 2023 Jan 21.

TrancriptomeReconstructoR: data-driven annotation of complex transcriptomes.转录组重构器：复杂转录组的基于数据驱动的注释。

BMC Bioinformatics. 2021 May 31;22(1):290. doi: 10.1186/s12859-021-04208-2.

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence.利用高通量 RNA 测序和 DNA 序列，Aptardi 预测样本特异性转录组中的多聚腺苷酸化位点。

Nat Commun. 2021 Mar 12;12(1):1652. doi: 10.1038/s41467-021-21894-x.

Transcript isoform sequencing reveals widespread promoter-proximal transcriptional termination in Arabidopsis.转录本异构体测序揭示拟南芥中广泛存在的启动子近端转录终止。

Nat Commun. 2020 May 22;11(1):2589. doi: 10.1038/s41467-020-16390-7.

Organismal benefits of transcription speed control at gene boundaries.基因边界转录速度控制的生物效益。

EMBO Rep. 2020 Apr 3;21(4):e49315. doi: 10.15252/embr.201949315. Epub 2020 Feb 27.

Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and mA modification.纳米孔直接 RNA 测序绘制拟南芥 mRNA 加工和 mA 修饰的复杂性图谱。

Elife. 2020 Jan 14;9:e49658. doi: 10.7554/eLife.49658.

本文引用的文献

DNA damage induces targeted, genome-wide variation of poly(A) sites in budding yeast.DNA 损伤诱导出芽酵母中聚（A）位点的靶向、全基因组变异。

Genome Res. 2013 Oct;23(10):1690-703. doi: 10.1101/gr.144964.112. Epub 2013 Jun 20.

Species-specific factors mediate extensive heterogeneity of mRNA 3' ends in yeasts.物种特异性因素介导酵母中 mRNA 3' 末端的广泛异质性。

Proc Natl Acad Sci U S A. 2013 Jul 2;110(27):11073-8. doi: 10.1073/pnas.1309384110. Epub 2013 Jun 17.

STAR: ultrafast universal RNA-seq aligner.STAR：超快通用 RNA-seq 对齐工具。

Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

GENCODE: the reference human genome annotation for The ENCODE Project.GENCODE：ENCODE 项目的人类参考基因组注释。

Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.

Direct sequencing of Arabidopsis thaliana RNA reveals patterns of cleavage and polyadenylation.拟南芥 RNA 的直接测序揭示了切割和多聚腺苷酸化的模式。

Nat Struct Mol Biol. 2012 Aug;19(8):845-52. doi: 10.1038/nsmb.2345. Epub 2012 Jul 22.

Incorporating RNA-seq data into the zebrafish Ensembl genebuild.将 RNA-seq 数据纳入斑马鱼 Ensembl 基因构建

Genome Res. 2012 Oct;22(10):2067-78. doi: 10.1101/gr.137901.112. Epub 2012 Jul 12.

Extensive alternative polyadenylation during zebrafish development.斑马鱼发育过程中的广泛可变多聚腺苷酸化。

Genome Res. 2012 Oct;22(10):2054-66. doi: 10.1101/gr.139733.112. Epub 2012 Jun 21.

Surveillance of 3' Noncoding Transcripts Requires FIERY1 and XRN3 in Arabidopsis.拟南芥中 3'非编码转录本的监测需要 FIERY1 和 XRN3。

G3 (Bethesda). 2012 Apr;2(4):487-98. doi: 10.1534/g3.111.001362. Epub 2012 Apr 1.

A beginner's guide to eukaryotic genome annotation.真核生物基因组注释入门指南。

Nat Rev Genet. 2012 Apr 18;13(5):329-42. doi: 10.1038/nrg3174.

A quantitative atlas of polyadenylation in five mammals.五个哺乳动物中多聚腺苷酸化的定量图谱。

Genome Res. 2012 Jun;22(6):1173-83. doi: 10.1101/gr.132563.111. Epub 2012 Mar 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过链特异性直接RNA测序、RNA测序和ESTs相结合，改进3'非翻译区和复杂基因座的注释。

Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献