Suppr超能文献

湿地松基因组的特点是具有多样化和高度分化的重复序列。

The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences.

机构信息

Section of Evolution and Ecology, University of California, Davis, CA 95616, USA.

出版信息

BMC Genomics. 2010 Jul 7;11:420. doi: 10.1186/1471-2164-11-420.

Abstract

BACKGROUND

In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda.

RESULTS

We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (> or = 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome.

CONCLUSIONS

This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal.

摘要

背景

在当今基因组发现的时代,尚未尝试全面对裸子植物基因组进行测序。针叶树科松属是针叶树中最大的属,其 110-120 个种具有非常大的基因组(约 20-40Gb,2N=24)。这些基因组的大小和复杂性引发了人们对完成针叶树基因组测序的可行性的诸多猜测。针叶树基因组据称高度重复,但有关裸子植物中重复单元的性质和身份的信息很少。松树具有广泛的遗传资源,有来自 11 个种的约 329000 个 EST 和 8 个种的遗传图谱,包括 Pinus taeda 的 12 个连锁群的高密度遗传图谱。

结果

我们在此展示了十个 P. taeda BAC 克隆的 Sanger 序列和注释,以及代表基因组 7.5%的 Genome Analyzer II 全基因组鸟枪法(WGS)序列。十个 BAC 的计算注释预测了三个可能的编码蛋白基因和至少十五个可能的假基因,这些基因几乎占据了近一个兆碱基的序列。我们在 BAC 中发现了三个针叶树特异性的 LTR 反转录元件,并根据与远缘被子植物的证据,初步鉴定了至少另外 15 个。WGS 序列与 BAC 的比对表明,80%的 BAC 序列在基因组的其他地方具有相似的拷贝(>或=75%核苷酸同一性),但只有 23%具有相同的拷贝(99%同一性)。鉴定了基因组中三个最常见的重复元件,它们加起来不到基因组的 5%。

结论

这项研究表明,P. taeda 基因组中的大多数重复序列都是“新的”,因此需要额外的 BAC 或基因组测序才能进行准确的描述。松树基因组含有大量分化且可能失效的重复元件。本研究还提供了新的证据,表明使用 WGS 方法对松树基因组进行测序是可行的目标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5466/2996948/1f2374d4843c/1471-2164-11-420-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验