使用最大转录本比对组装改进拟南芥基因组注释

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.

作者信息

Haas Brian J, Delcher Arthur L, Mount Stephen M, Wortman Jennifer R, Smith Roger K, Hannick Linda I, Maiti Rama, Ronning Catherine M, Rusch Douglas B, Town Christopher D, Salzberg Steven L, White Owen

机构信息

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.

出版信息

Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. doi: 10.1093/nar/gkg770.

DOI:10.1093/nar/gkg770

PMID:14500829

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC206470/

Abstract

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the approximately 27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

摘要

将表达序列数据与基因组序列进行剪接比对，已被证明是真核生物基因组中基因全面注释的关键工具。我们开发了一种新算法，用于将重叠转录本比对（EST和全长cDNA）的簇组装成最大比对组件，从而全面整合所有可用的转录本数据并捕捉细微的剪接变异。通过这种方法鉴定出的完整和部分基因结构，被用于改进美国基因组研究所的拟南芥基因组注释（TIGR版本4.0）。这些比对组件允许对几个新基因和1000多个可变剪接变异进行自动建模，以及对约27000个已注释蛋白质编码基因中近一半进行更新（包括UTR注释）。本文描述了拼接比对组装程序（PASA）工具的算法，以及拟南芥基因注释自动更新的结果。

相似文献

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.

Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. doi: 10.1093/nar/gkg770.

Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.

J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.

Integrating alternative splicing detection into gene prediction.

BMC Bioinformatics. 2005 Feb 10;6:25. doi: 10.1186/1471-2105-6-25.

GeneSeqer@PlantGDB: Gene structure prediction in plant genomes.

Nucleic Acids Res. 2003 Jul 1;31(13):3597-600. doi: 10.1093/nar/gkg533.

Features of Arabidopsis genes and genome discovered using full-length cDNAs.

Plant Mol Biol. 2006 Jan;60(1):69-85. doi: 10.1007/s11103-005-2564-9.

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups.

BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.

Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis.

BMC Genomics. 2006 Dec 28;7:327. doi: 10.1186/1471-2164-7-327.

Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping.

Plant Physiol. 2003 Jun;132(2):469-84. doi: 10.1104/pp.102.018101.

The TIGR Plant Transcript Assemblies database.

Nucleic Acids Res. 2007 Jan;35(Database issue):D846-51. doi: 10.1093/nar/gkl785. Epub 2006 Nov 6.

Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus.

Bioinformatics. 2004 May 1;20(7):1157-69. doi: 10.1093/bioinformatics/bth058. Epub 2004 Feb 5.

引用本文的文献

A nearly complete haplotype-phased genome assembly of nerve plant () provides insights into leaf color evolution.

Hortic Res. 2025 Jun 26;12(9):uhaf154. doi: 10.1093/hr/uhaf154. eCollection 2025 Sep.

Chromosome-scale genome assembly and gene annotation of the hydrothermal vent annelid Alvinella pompejana yield insight into animal evolution in extreme environments.

BMC Biol. 2025 Sep 2;23(1):274. doi: 10.1186/s12915-025-02369-7.

High-quality whole genome data of .

Data Brief. 2025 Aug 9;62:111978. doi: 10.1016/j.dib.2025.111978. eCollection 2025 Oct.

Chromosome-scale genome assembly of Helcystogramma triannulella (Lepidoptera: Gelechiidae).

Sci Data. 2025 Sep 1;12(1):1525. doi: 10.1038/s41597-025-05850-8.

A telomere-to-telomere genome assembly of koi carp (Cyprinus carpio) using long reads and Hi-C technology.

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf087.

Re-annotation improved large-scale assembly of the reef-building coral Acropora intermedia.

Sci Data. 2025 Aug 28;12(1):1504. doi: 10.1038/s41597-025-05849-1.

Chromosome-level genome assembly of Hippophae salicifolia.

Sci Data. 2025 Aug 28;12(1):1503. doi: 10.1038/s41597-025-05844-6.

Chromosomal-level genome assembly of an allotetraploid oyster.

Sci Data. 2025 Aug 26;12(1):1492. doi: 10.1038/s41597-025-05775-2.

A telomere-to-telomere gap-free genome assembly of the protandrous hermaphrodite Asian seabass (Lates calcarifer).

Sci Data. 2025 Aug 21;12(1):1457. doi: 10.1038/s41597-025-05735-w.

Genome of the Myiasis-Causing Fly Chrysomya bezziana, the Old-World Screwworm.

Genome Biol Evol. 2025 Jul 30;17(8). doi: 10.1093/gbe/evaf121.

本文引用的文献

Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping.

Plant Physiol. 2003 Jun;132(2):469-84. doi: 10.1104/pp.102.018101.

Annotation of the Arabidopsis genome.

Plant Physiol. 2003 Jun;132(2):461-8. doi: 10.1104/pp.103.022251.

Computational discovery of internal micro-exons.

Genome Res. 2003 Jun;13(6A):1216-21. doi: 10.1101/gr.677503.

Database resources of the National Center for Biotechnology.

Nucleic Acids Res. 2003 Jan 1;31(1):28-33. doi: 10.1093/nar/gkg033.

Cloning and sequencing of cDNAs for hypothetical genes from chromosome 2 of Arabidopsis.

Plant Physiol. 2002 Dec;130(4):2118-28. doi: 10.1104/pp.010207.

Detection of Arabidopsis thaliana AtRAD1 cDNA variants and assessment of function by expression in a yeast rad1 mutant.

Gene. 2002 Aug 21;296(1-2):1-9. doi: 10.1016/s0378-1119(02)00869-7.

Full-length messenger RNA sequences greatly improve genome annotation.

Genome Biol. 2002;3(6):RESEARCH0029. doi: 10.1186/gb-2002-3-6-research0029. Epub 2002 May 30.

BLAT--the BLAST-like alignment tool.

Genome Res. 2002 Apr;12(4):656-64. doi: 10.1101/gr.229202.

Functional annotation of a full-length Arabidopsis cDNA collection.

Science. 2002 Apr 5;296(5565):141-5. doi: 10.1126/science.1071006. Epub 2002 Mar 21.

Alternative splicing and genome complexity.

Nat Genet. 2002 Jan;30(1):29-30. doi: 10.1038/ng803. Epub 2001 Dec 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用最大转录本比对组装改进拟南芥基因组注释

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.

作者信息

机构信息

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.

出版信息

Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. doi: 10.1093/nar/gkg770.

DOI:10.1093/nar/gkg770

PMID:14500829

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC206470/

Abstract

摘要

使用最大转录本比对组装改进拟南芥基因组注释

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用最大转录本比对组装改进拟南芥基因组注释

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.

作者信息

机构信息

出版信息