Suppr超能文献

火炬松基因组研究进展:BAC 和 fosmid 序列的特征分析。

Insights into the loblolly pine genome: characterization of BAC and fosmid sequences.

机构信息

Department of Plant Sciences, University of California Davis, Davis, California, United States of America.

出版信息

PLoS One. 2013 Sep 4;8(9):e72439. doi: 10.1371/journal.pone.0072439. eCollection 2013.

Abstract

Despite their prevalence and importance, the genome sequences of loblolly pine, Norway spruce, and white spruce, three ecologically and economically important conifer species, are just becoming available to the research community. Following the completion of these large assemblies, annotation efforts will be undertaken to characterize the reference sequences. Accurate annotation of these ancient genomes would be aided by a comprehensive repeat library; however, few studies have generated enough sequence to fully evaluate and catalog their non-genic content. In this paper, two sets of loblolly pine genomic sequence, 103 previously assembled BACs and 90,954 newly sequenced and assembled fosmid scaffolds, were analyzed. Together, this sequence represents 280 Mbp (roughly 1% of the loblolly pine genome) and one of the most comprehensive studies of repetitive elements and genes in a gymnosperm species. A combination of homology and de novo methodologies were applied to identify both conserved and novel repeats. Similarity analysis estimated a repetitive content of 27% that included both full and partial elements. When combined with the de novo investigation, the estimate increased to almost 86%. Over 60% of the repetitive sequence consists of full or partial LTR (long terminal repeat) retrotransposons. Through de novo approaches, 6,270 novel, full-length transposable element families and 9,415 sub-families were identified. Among those 6,270 families, 82% were annotated as single-copy. Several of the novel, high-copy families are described here, with the largest, PtPiedmont, comprising 133 full-length copies. In addition to repeats, analysis of the coding region reported 23 full-length eukaryotic orthologous proteins (KOGS) and another 29 novel or orthologous genes. These discoveries, along with other genomic resources, will be used to annotate conifer genomes and address long-standing questions about gymnosperm evolution.

摘要

尽管落叶松、挪威云杉和白云杉这三种生态和经济上重要的针叶树的基因组序列已经变得普遍存在和重要,但它们的序列对于研究社区来说才刚刚可用。在完成这些大型组装之后,将进行注释工作以描述参考序列。如果有一个全面的重复序列库,那么对这些古老基因组的准确注释将得到帮助;然而,很少有研究产生足够的序列来全面评估和编目它们的非基因内容。在本文中,分析了两组火炬松基因组序列,103 个先前组装的 BAC 和 90954 个新测序和组装的 fosmid 支架。这两组序列共代表了 280Mbp(大致是火炬松基因组的 1%),这是对裸子植物物种中的重复元件和基因进行的最全面的研究之一。同源性和从头方法的组合被应用于识别保守和新的重复序列。相似性分析估计重复序列的含量为 27%,其中包括完整和部分元件。当与从头研究结合时,估计值增加到近 86%。超过 60%的重复序列由完整或部分 LTR(长末端重复)逆转录转座子组成。通过从头方法,鉴定了 6270 个新的全长转座元件家族和 9415 个子家族。在这 6270 个家族中,82%被注释为单拷贝。这里描述了一些新的、高拷贝家族,其中最大的 PtPiedmont 家族由 133 个全长拷贝组成。除了重复序列外,对编码区的分析报告了 23 个全长真核直系同源蛋白(KOGS)和另外 29 个新的或直系同源基因。这些发现,以及其他基因组资源,将用于注释针叶树基因组,并解决关于裸子植物进化的长期存在的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cad/3762812/657ae253bc76/pone.0072439.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验