拟南芥基因组未注释区域预测的新基因的实验验证。

Experimental validation of novel genes predicted in the un-annotated regions of the Arabidopsis genome.

作者信息

Moskal William A, Wu Hank C, Underwood Beverly A, Wang Wei, Town Christopher D, Xiao Yongli

机构信息

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA.

出版信息

BMC Genomics. 2007 Jan 17;8:18. doi: 10.1186/1471-2164-8-18.

DOI:10.1186/1471-2164-8-18

PMID:17229318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1783852/

Abstract

BACKGROUND

Several lines of evidence support the existence of novel genes and other transcribed units which have not yet been annotated in the Arabidopsis genome. Two gene prediction programs which make use of comparative genomic analysis, Twinscan and EuGene, have recently been deployed on the Arabidopsis genome. The ability of these programs to make use of sequence data from other species has allowed both Twinscan and EuGene to predict over 1000 genes that are intergenic with respect to the most recent annotation release. A high throughput RACE pipeline was utilized in an attempt to verify the structure and expression of these novel genes.

RESULTS

1,071 un-annotated loci were targeted by RACE, and full length sequence coverage was obtained for 35% of the targeted genes. We have verified the structure and expression of 378 genes that were not present within the most recent release of the Arabidopsis genome annotation. These 378 genes represent a structurally diverse set of transcripts and encode a functionally diverse set of proteins.

CONCLUSION

We have investigated the accuracy of the Twinscan and EuGene gene prediction programs and found them to be reliable predictors of gene structure in Arabidopsis. Several hundred previously un-annotated genes were validated by this work. Based upon this information derived from these efforts it is likely that the Arabidopsis genome annotation continues to overlook several hundred protein coding genes.

摘要

背景

有几条证据支持拟南芥基因组中存在尚未注释的新基因和其他转录单元。最近，利用比较基因组分析的两个基因预测程序Twinscan和EuGene已应用于拟南芥基因组。这些程序利用其他物种序列数据的能力使Twinscan和EuGene都能预测出1000多个相对于最新注释版本而言位于基因间区域的基因。为了验证这些新基因的结构和表达，采用了一种高通量RACE方法。

结果

RACE针对1071个未注释的基因座，35%的目标基因获得了全长序列覆盖。我们已经验证了拟南芥基因组最新版本中不存在的378个基因的结构和表达。这378个基因代表了一组结构多样的转录本，并编码一组功能多样的蛋白质。

结论

我们研究了Twinscan和EuGene基因预测程序的准确性，发现它们是拟南芥基因结构的可靠预测工具。这项工作验证了几百个以前未注释的基因。基于这些努力获得的信息，拟南芥基因组注释可能仍然遗漏了几百个蛋白质编码基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f18f/1783852/c4d4561ba488/1471-2164-8-18-1.jpg

相似文献

Experimental validation of novel genes predicted in the un-annotated regions of the Arabidopsis genome.

BMC Genomics. 2007 Jan 17;8:18. doi: 10.1186/1471-2164-8-18.

Orphan transcripts in Arabidopsis thaliana: identification of several hundred previously unrecognized genes.

Plant J. 2005 Jul;43(2):205-12. doi: 10.1111/j.1365-313X.2005.02438.x.

Bicistronic and fused monocistronic transcripts are derived from adjacent loci in the Arabidopsis genome.

RNA. 2005 Feb;11(2):128-38. doi: 10.1261/rna.7114505.

Peptomics, identification of novel cationic Arabidopsis peptides with conserved sequence motifs.

In Silico Biol. 2002;2(4):441-51.

JIGSAW: integration of multiple sources of evidence for gene prediction.

Bioinformatics. 2005 Sep 15;21(18):3596-603. doi: 10.1093/bioinformatics/bti609. Epub 2005 Aug 2.

Mapping of transcription start sites of human retina expressed genes.

BMC Genomics. 2007 Feb 7;8:42. doi: 10.1186/1471-2164-8-42.

Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing.

Plant Physiol. 2007 May;144(1):32-42. doi: 10.1104/pp.107.096677. Epub 2007 Mar 9.

The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation.

In Silico Biol. 2007;7(6):575-82.

Proteogenomics: needs and roles to be filled by proteomics in genome annotation.

Brief Funct Genomic Proteomic. 2008 Jan;7(1):50-62. doi: 10.1093/bfgp/eln010. Epub 2008 Mar 10.

A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis).

BMC Genomics. 2008 Oct 14;9:484. doi: 10.1186/1471-2164-9-484.

引用本文的文献

SHARP: genome-scale identification of gene-protein-reaction associations in cyanobacteria.

Photosynth Res. 2013 Nov;118(1-2):181-90. doi: 10.1007/s11120-013-9910-6. Epub 2013 Aug 24.

TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes.

Front Plant Sci. 2012 Jan 31;3:5. doi: 10.3389/fpls.2012.00005. eCollection 2012.

High throughput generation of promoter reporter (GFP) transgenic lines of low expressing genes in Arabidopsis and analysis of their expression patterns.

Plant Methods. 2010 Aug 6;6:18. doi: 10.1186/1746-4811-6-18.

Unique genes in plants: specificities and conserved features throughout evolution.

BMC Evol Biol. 2008 Oct 10;8:280. doi: 10.1186/1471-2148-8-280.

Large-scale analysis of the GRAS gene family in Arabidopsis thaliana.

Plant Mol Biol. 2008 Aug;67(6):659-70. doi: 10.1007/s11103-008-9345-1. Epub 2008 May 26.

Alternative splicing at NAGNAG acceptors in Arabidopsis thaliana SR and SR-related protein-coding genes.

BMC Genomics. 2008 Apr 10;9:159. doi: 10.1186/1471-2164-9-159.

Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome.

BMC Genomics. 2007 Nov 2;8:401. doi: 10.1186/1471-2164-8-401.

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome.

BMC Genomics. 2007 Oct 25;8:388. doi: 10.1186/1471-2164-8-388.

本文引用的文献

Simultaneous high-throughput recombinational cloning of open reading frames in closed and open configurations.

Plant Biotechnol J. 2006 May;4(3):317-24. doi: 10.1111/j.1467-7652.2006.00183.x.

Genomewide comparative analysis of alternative splicing in plants.

Proc Natl Acad Sci U S A. 2006 May 2;103(18):7175-80. doi: 10.1073/pnas.0602039103. Epub 2006 Apr 21.

Features of Arabidopsis genes and genome discovered using full-length cDNAs.

Plant Mol Biol. 2006 Jan;60(1):69-85. doi: 10.1007/s11103-005-2564-9.

Annotating the genome of Medicago truncatula.

Curr Opin Plant Biol. 2006 Apr;9(2):122-7. doi: 10.1016/j.pbi.2006.01.004. Epub 2006 Feb 2.

Analysis of the cDNAs of hypothetical genes on Arabidopsis chromosome 2 reveals numerous transcript variants.

Plant Physiol. 2005 Nov;139(3):1323-37. doi: 10.1104/pp.105.063479. Epub 2005 Oct 21.

Orphan transcripts in Arabidopsis thaliana: identification of several hundred previously unrecognized genes.

Plant J. 2005 Jul;43(2):205-12. doi: 10.1111/j.1365-313X.2005.02438.x.

Genome organization of more than 300 defensin-like genes in Arabidopsis.

Plant Physiol. 2005 Jun;138(2):600-10. doi: 10.1104/pp.105.060079.

Gene finding in the chicken genome.

BMC Bioinformatics. 2005 May 30;6:131. doi: 10.1186/1471-2105-6-131.

Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions.

Genome Res. 2005 Apr;15(4):577-82. doi: 10.1101/gr.3329005.

Comparing low coverage random shotgun sequence data from Brassica oleracea and Oryza sativa genome sequence for their ability to add to the annotation of Arabidopsis thaliana.

Genome Res. 2005 Apr;15(4):496-504. doi: 10.1101/gr.3239105.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

拟南芥基因组未注释区域预测的新基因的实验验证。

Experimental validation of novel genes predicted in the un-annotated regions of the Arabidopsis genome.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献