Suppr超能文献

拟南芥基因组未注释区域预测的新基因的实验验证。

Experimental validation of novel genes predicted in the un-annotated regions of the Arabidopsis genome.

作者信息

Moskal William A, Wu Hank C, Underwood Beverly A, Wang Wei, Town Christopher D, Xiao Yongli

机构信息

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA.

出版信息

BMC Genomics. 2007 Jan 17;8:18. doi: 10.1186/1471-2164-8-18.

Abstract

BACKGROUND

Several lines of evidence support the existence of novel genes and other transcribed units which have not yet been annotated in the Arabidopsis genome. Two gene prediction programs which make use of comparative genomic analysis, Twinscan and EuGene, have recently been deployed on the Arabidopsis genome. The ability of these programs to make use of sequence data from other species has allowed both Twinscan and EuGene to predict over 1000 genes that are intergenic with respect to the most recent annotation release. A high throughput RACE pipeline was utilized in an attempt to verify the structure and expression of these novel genes.

RESULTS

1,071 un-annotated loci were targeted by RACE, and full length sequence coverage was obtained for 35% of the targeted genes. We have verified the structure and expression of 378 genes that were not present within the most recent release of the Arabidopsis genome annotation. These 378 genes represent a structurally diverse set of transcripts and encode a functionally diverse set of proteins.

CONCLUSION

We have investigated the accuracy of the Twinscan and EuGene gene prediction programs and found them to be reliable predictors of gene structure in Arabidopsis. Several hundred previously un-annotated genes were validated by this work. Based upon this information derived from these efforts it is likely that the Arabidopsis genome annotation continues to overlook several hundred protein coding genes.

摘要

背景

有几条证据支持拟南芥基因组中存在尚未注释的新基因和其他转录单元。最近,利用比较基因组分析的两个基因预测程序Twinscan和EuGene已应用于拟南芥基因组。这些程序利用其他物种序列数据的能力使Twinscan和EuGene都能预测出1000多个相对于最新注释版本而言位于基因间区域的基因。为了验证这些新基因的结构和表达,采用了一种高通量RACE方法。

结果

RACE针对1071个未注释的基因座,35%的目标基因获得了全长序列覆盖。我们已经验证了拟南芥基因组最新版本中不存在的378个基因的结构和表达。这378个基因代表了一组结构多样的转录本,并编码一组功能多样的蛋白质。

结论

我们研究了Twinscan和EuGene基因预测程序的准确性,发现它们是拟南芥基因结构的可靠预测工具。这项工作验证了几百个以前未注释的基因。基于这些努力获得的信息,拟南芥基因组注释可能仍然遗漏了几百个蛋白质编码基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f18f/1783852/c4d4561ba488/1471-2164-8-18-1.jpg

相似文献

5
JIGSAW: integration of multiple sources of evidence for gene prediction.
Bioinformatics. 2005 Sep 15;21(18):3596-603. doi: 10.1093/bioinformatics/bti609. Epub 2005 Aug 2.
6
Mapping of transcription start sites of human retina expressed genes.
BMC Genomics. 2007 Feb 7;8:42. doi: 10.1186/1471-2164-8-42.
7
Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing.
Plant Physiol. 2007 May;144(1):32-42. doi: 10.1104/pp.107.096677. Epub 2007 Mar 9.
9
Proteogenomics: needs and roles to be filled by proteomics in genome annotation.
Brief Funct Genomic Proteomic. 2008 Jan;7(1):50-62. doi: 10.1093/bfgp/eln010. Epub 2008 Mar 10.

引用本文的文献

1
SHARP: genome-scale identification of gene-protein-reaction associations in cyanobacteria.
Photosynth Res. 2013 Nov;118(1-2):181-90. doi: 10.1007/s11120-013-9910-6. Epub 2013 Aug 24.
2
TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes.
Front Plant Sci. 2012 Jan 31;3:5. doi: 10.3389/fpls.2012.00005. eCollection 2012.
4
Unique genes in plants: specificities and conserved features throughout evolution.
BMC Evol Biol. 2008 Oct 10;8:280. doi: 10.1186/1471-2148-8-280.
5
Large-scale analysis of the GRAS gene family in Arabidopsis thaliana.
Plant Mol Biol. 2008 Aug;67(6):659-70. doi: 10.1007/s11103-008-9345-1. Epub 2008 May 26.
8
EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome.
BMC Genomics. 2007 Oct 25;8:388. doi: 10.1186/1471-2164-8-388.

本文引用的文献

1
Simultaneous high-throughput recombinational cloning of open reading frames in closed and open configurations.
Plant Biotechnol J. 2006 May;4(3):317-24. doi: 10.1111/j.1467-7652.2006.00183.x.
2
Genomewide comparative analysis of alternative splicing in plants.
Proc Natl Acad Sci U S A. 2006 May 2;103(18):7175-80. doi: 10.1073/pnas.0602039103. Epub 2006 Apr 21.
3
Features of Arabidopsis genes and genome discovered using full-length cDNAs.
Plant Mol Biol. 2006 Jan;60(1):69-85. doi: 10.1007/s11103-005-2564-9.
4
Annotating the genome of Medicago truncatula.
Curr Opin Plant Biol. 2006 Apr;9(2):122-7. doi: 10.1016/j.pbi.2006.01.004. Epub 2006 Feb 2.
5
Analysis of the cDNAs of hypothetical genes on Arabidopsis chromosome 2 reveals numerous transcript variants.
Plant Physiol. 2005 Nov;139(3):1323-37. doi: 10.1104/pp.105.063479. Epub 2005 Oct 21.
7
Genome organization of more than 300 defensin-like genes in Arabidopsis.
Plant Physiol. 2005 Jun;138(2):600-10. doi: 10.1104/pp.105.060079.
8
Gene finding in the chicken genome.
BMC Bioinformatics. 2005 May 30;6:131. doi: 10.1186/1471-2105-6-131.
9
Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions.
Genome Res. 2005 Apr;15(4):577-82. doi: 10.1101/gr.3329005.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验