全长信使核糖核酸序列极大地改善了基因组注释。

Full-length messenger RNA sequences greatly improve genome annotation.

作者信息

Haas Brian J, Volfovsky Natalia, Town Christopher D, Troukhan Maxim, Alexandrov Nickolai, Feldmann Kenneth A, Flavell Richard B, White Owen, Salzberg Steven L

机构信息

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.

出版信息

Genome Biol. 2002;3(6):RESEARCH0029. doi: 10.1186/gb-2002-3-6-research0029. Epub 2002 May 30.

DOI:10.1186/gb-2002-3-6-research0029

PMID:12093376

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC116726/

Abstract

BACKGROUND

Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism.

RESULTS

Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation.

CONCLUSIONS

Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.

摘要

背景

真核生物基因组注释是一项复杂的工作，需要整合来自多个常常相互矛盾的数据源的证据。随着现在可用的基因组序列数据量不断增加，准确识别大量基因的方法变得迫切需要。为了创建一组非常高质量的基因模型，我们使用了来自拟南芥的5000个全长基因转录本的序列来重新注释其基因组。我们已将这些转录本定位到它们的确切染色体位置，并使用比对程序创建了为该生物体提供参考集的基因模型。

结果

大约35%的转录本表明先前注释的基因需要修改，5%的转录本代表新发现的基因。我们还发现多个转录起始位点似乎比以前所知的更为常见，并且我们报告了许多可变mRNA剪接的情况。我们包括了不同比对软件的比较以及转录本数据如何改进先前发表的注释的分析。

结论

我们的结果表明，大量全长转录本测序后进行计算定位极大地改善了真核生物基因完整外显子结构的识别。此外，我们能够在基因的非翻译区域发现大量内含子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaff/116726/5e2979016721/gb-2002-3-6-research0029-1.jpg

相似文献

Full-length messenger RNA sequences greatly improve genome annotation.全长信使核糖核酸序列极大地改善了基因组注释。

Genome Biol. 2002;3(6):RESEARCH0029. doi: 10.1186/gb-2002-3-6-research0029. Epub 2002 May 30.

Features of Arabidopsis genes and genome discovered using full-length cDNAs.利用全长cDNA发现的拟南芥基因和基因组特征。

Plant Mol Biol. 2006 Jan;60(1):69-85. doi: 10.1007/s11103-005-2564-9.

Mining Arabidopsis thaliana RNA-seq data with Integrated Genome Browser reveals stress-induced alternative splicing of the putative splicing regulator SR45a.利用 Integrated Genome Browser 挖掘拟南芥 RNA-seq 数据揭示了应激诱导的假定剪接调控因子 SR45a 的可变剪接。

Am J Bot. 2012 Feb;99(2):219-31. doi: 10.3732/ajb.1100355. Epub 2012 Jan 30.

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.使用最大转录本比对组装改进拟南芥基因组注释

Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. doi: 10.1093/nar/gkg770.

The ASRG database: identification and survey of Arabidopsis thaliana genes involved in pre-mRNA splicing.ASRG数据库：参与前体mRNA剪接的拟南芥基因的鉴定与调查。

Genome Biol. 2004;5(12):R102. doi: 10.1186/gb-2004-5-12-r102. Epub 2004 Nov 29.

MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations.MAKER-P：一个用于快速创建、管理和质量控制植物基因组注释的工具包。

Plant Physiol. 2014 Feb;164(2):513-24. doi: 10.1104/pp.113.230144. Epub 2013 Dec 4.

Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.通过基因组DNA与蛋白质序列的剪接比对进行基因结构预测：通过差异剪接位点评分提高准确性。

J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.

Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome.可变起始、剪接和终止对小鼠转录组编码的mRNA转录本多样性的影响。

Genome Res. 2003 Jun;13(6B):1290-300. doi: 10.1101/gr.1017303.

Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing.利用大规模平行焦磷酸测序对拟南芥转录组进行采样。

Plant Physiol. 2007 May;144(1):32-42. doi: 10.1104/pp.107.096677. Epub 2007 Mar 9.

Integrating alternative splicing detection into gene prediction.将可变剪接检测整合到基因预测中。

BMC Bioinformatics. 2005 Feb 10;6:25. doi: 10.1186/1471-2105-6-25.

引用本文的文献

The CLE33 peptide represses phloem differentiation via autocrine and paracrine signaling in Arabidopsis.CLE33 肽通过自分泌和旁分泌信号在拟南芥中抑制韧皮部分化。

Commun Biol. 2023 Jun 6;6(1):588. doi: 10.1038/s42003-023-04972-2.

Intestinal Epithelial Cell-Related Alternative Splicing Events in Dextran Sodium Sulfate-Induced Acute Colitis.葡聚糖硫酸钠诱导的急性结肠炎中与肠上皮细胞相关的可变剪接事件。

Turk J Gastroenterol. 2023 May;34(5):490-496. doi: 10.5152/tjg.2023.22572.

A broad view: Dick Flavell.广阔视野：迪克·弗莱维尔

Plant Physiol. 2021 Apr 2;185(3):727-730. doi: 10.1093/plphys/kiaa111.

Perspective: 50 years of plant chromosome biology.观点：50 年的植物染色体生物学。

Plant Physiol. 2021 Apr 2;185(3):731-753. doi: 10.1093/plphys/kiaa108.

Phylogenetic analyses and in-seedling expression of ammonium and nitrate transporters in wheat.小麦铵态氮和硝态氮转运蛋白的系统发育分析与苗期表达。

Sci Rep. 2018 May 4;8(1):7082. doi: 10.1038/s41598-018-25430-8.

High Quality Unigenes and Microsatellite Markers from Tissue Specific Transcriptome and Development of a Database in Clusterbean (Cyamopsis tetragonoloba, L. Taub).来自瓜尔豆（Cyamopsis tetragonoloba, L. Taub）组织特异性转录组的高质量单基因簇和微卫星标记以及数据库的开发

Genes (Basel). 2017 Nov 9;8(11):313. doi: 10.3390/genes8110313.

Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.Seqping：使用自训练基因模型和转录组数据的植物基因组基因预测流程

BMC Bioinformatics. 2017 Jan 27;18(Suppl 1):1426. doi: 10.1186/s12859-016-1426-6.

Genome-Wide Identification and Characterization of the LRR-RLK Gene Family in Two Vernicia Species.两种油桐属植物中LRR-RLK基因家族的全基因组鉴定与特征分析

Int J Genomics. 2015;2015:823427. doi: 10.1155/2015/823427. Epub 2015 Dec 13.

Identification and characterisation of putative seminal fluid proteins from male reproductive tissue EST libraries in tiger beetles.从虎甲雄性生殖组织EST文库中鉴定和表征假定的精液蛋白

BMC Genomics. 2015 May 16;16(1):391. doi: 10.1186/s12864-015-1619-9.

High-throughput sequencing and de novo assembly of Brassica oleracea var. Capitata L. for transcriptome analysis.用于转录组分析的甘蓝型油菜变种结球甘蓝的高通量测序与从头组装。

PLoS One. 2014 Mar 28;9(3):e92087. doi: 10.1371/journal.pone.0092087. eCollection 2014.

本文引用的文献

Co-transcriptional splicing of pre-messenger RNAs: considerations for the mechanism of alternative splicing.前体信使RNA的共转录剪接：可变剪接机制的相关考量

Gene. 2001 Oct 17;277(1-2):31-47. doi: 10.1016/s0378-1119(01)00695-3.

Alternative RNA splicing in the nervous system.神经系统中的可变RNA剪接

Prog Neurobiol. 2001 Oct;65(3):289-308. doi: 10.1016/s0301-0082(01)00007-7.

A large family of genes that share homology with CLAVATA3.与CLAVATA3具有同源性的一大类基因。

Plant Physiol. 2001 Jul;126(3):939-42. doi: 10.1104/pp.126.3.939.

Gene duplication in the diversification of secondary metabolism: tandem 2-oxoglutarate-dependent dioxygenases control glucosinolate biosynthesis in Arabidopsis.次生代谢多样化中的基因复制：串联的2-氧代戊二酸依赖性双加氧酶控制拟南芥中硫代葡萄糖苷的生物合成。

Plant Cell. 2001 Mar;13(3):681-93. doi: 10.1105/tpc.13.3.681.

Initial sequencing and analysis of the human genome.人类基因组的初步测序与分析。

Nature. 2001 Feb 15;409(6822):860-921. doi: 10.1038/35057062.

Sequence and analysis of the Arabidopsis genome.拟南芥基因组的测序与分析。

Curr Opin Plant Biol. 2001 Apr;4(2):105-10. doi: 10.1016/s1369-5266(00)00144-8.

The sequence of the human genome.人类基因组序列。

Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040.

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.开花植物拟南芥的基因组序列分析。

Nature. 2000 Dec 14;408(6814):796-815. doi: 10.1038/35048692.

Developmentally and transgene regulated nuclear processing of primary transcripts of chalcone synthase A in petunia.矮牵牛中查尔酮合酶A初级转录本的发育调控和转基因调控的核加工

Plant J. 2000 Jul;23(1):63-72. doi: 10.1046/j.1365-313x.2000.00793.x.

Optimal spliced alignment of homologous cDNA to a genomic DNA template.同源cDNA与基因组DNA模板的最佳剪接比对。

Bioinformatics. 2000 Mar;16(3):203-11. doi: 10.1093/bioinformatics/16.3.203.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

全长信使核糖核酸序列极大地改善了基因组注释。

Full-length messenger RNA sequences greatly improve genome annotation.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献