Suppr超能文献

使用454生命科学技术对蒺藜苜蓿表达序列标签进行测序。

Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology.

作者信息

Cheung Foo, Haas Brian J, Goldberg Susanne M D, May Gregory D, Xiao Yongli, Town Christopher D

机构信息

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.

出版信息

BMC Genomics. 2006 Oct 24;7:272. doi: 10.1186/1471-2164-7-272.

Abstract

BACKGROUND

In this study, we addressed whether a single 454 Life Science GS20 sequencing run provides new gene discovery from a normalized cDNA library, and whether the short reads produced via this technology are of value in gene structure annotation.

RESULTS

A single 454 GS20 sequencing run on adapter-ligated cDNA, from a normalized cDNA library, generated 292,465 reads that were reduced to 252,384 reads with an average read length of 92 nucleotides after cleaning. After clustering and assembly, a total of 184,599 unique sequences were generated containing over 400 SSRs. The 454 sequences generated hits to more genes than a comparable amount of sequence from MtGI. Although short, the 454 reads are of sufficient length to map to a unique genome location as effectively as longer ESTs produced by conventional sequencing. Functional interpretation of the sequences was carried out by Gene Ontology assignments from matches to Arabidopsis and was shown to cover a broad range of GO categories. 53,796 assemblies and singletons (29%) had no match in the existing MtGI. Within the previously unobserved Medicago transcripts, thousands had matches in a comprehensive protein database and one or more of the TIGR Plant Gene Indices. Approximately 20% of these novel sequences could be found in the Medicago genome sequence. A total of 70,026 reads generated by the 454 technology were mapped to 785 Medicago finished BACs using PASA and over 1,000 gene models required modification. In parallel to 454 sequencing, 4,445 5'-prime reads were generated by conventional sequencing using the same library and from the assembled sequences it was shown to contain about 52% full length cDNAs encoding proteins from 50 to over 500 amino acids in length.

CONCLUSION

Due to the large number of reads afforded by the 454 DNA sequencing technology, it is effective in revealing the expression of transcripts from a broad range of GO categories and contains many rare transcripts in normalized cDNA libraries, although only a limited portion of their sequence is uncovered. As with longer ESTs, 454 reads can be mapped uniquely onto genomic sequence to provide support for, and modifications of, gene predictions.

摘要

背景

在本研究中,我们探讨了单次454生命科学GS20测序能否从标准化cDNA文库中发现新基因,以及通过该技术产生的短读段在基因结构注释中是否有价值。

结果

对来自标准化cDNA文库的连接接头的cDNA进行单次454 GS20测序,产生了292,465条读段,经过清理后减少到252,384条读段,平均读长为92个核苷酸。经过聚类和组装,共产生了184,599条独特序列,其中包含400多个简单序列重复(SSR)。454测序产生的序列比对到的基因比来自苜蓿基因组整合数据库(MtGI)的等量序列更多。尽管454读段较短,但它们的长度足以有效地映射到唯一的基因组位置,与传统测序产生的较长的表达序列标签(EST)一样有效。通过与拟南芥匹配进行基因本体论(Gene Ontology)注释对序列进行功能解读,结果显示其涵盖了广泛的基因本体类别。53,796个组装序列和单条序列(29%)在现有的MtGI中没有匹配项。在以前未观察到的苜蓿转录本中,数千个在一个综合蛋白质数据库和一个或多个TIGR植物基因索引中有匹配项。这些新序列中约20%可以在苜蓿基因组序列中找到。使用程序分析和软件组装(PASA)将454技术产生的总共70,026条读段映射到785个苜蓿完成的细菌人工染色体(BAC)上,超过1000个基因模型需要修改。与454测序并行,使用相同文库通过传统测序产生了4445条5'端读段,从组装序列中可以看出,它包含约52%的全长cDNA,编码长度从50到500多个氨基酸的蛋白质。

结论

由于454 DNA测序技术提供了大量读段,它在揭示来自广泛基因本体类别的转录本表达方面是有效的,并且在标准化cDNA文库中包含许多稀有转录本,尽管只揭示了它们有限的部分序列。与较长的EST一样,454读段可以唯一地映射到基因组序列上,为基因预测提供支持并进行修改。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29d2/1635983/4c40d22124f5/1471-2164-7-272-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验