Suppr超能文献

未鉴定的松树基因组的全外显子靶向测序。

Whole-exome targeted sequencing of the uncharacterized pine genome.

机构信息

Graduate Program in Plant Molecular and Cellular Biology, University of Florida, Gainesville, FL 32611, USA.

School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611, USA.

出版信息

Plant J. 2013 Jul;75(1):146-156. doi: 10.1111/tpj.12193. Epub 2013 May 7.

Abstract

The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high-throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm.

摘要

许多物种的基因组较大,这阻碍了将基因组工具应用于研究这些物种。例如,生态和经济上重要的针叶树火炬松(Pinus taeda L.)具有 21.7 Gbp 的大型未被充分描述的基因组。为了对松属基因组进行特征描述,我们对来自表达序列标签组装的 14729 个基因进行了外显子捕获和测序。对序列捕获的效率进行了评估,结果表明,随着复杂性水平的提高(包括单倍体 cDNA、单倍体基因组 DNA 和二倍体基因组 DNA),所有样本之间的效率都相似。然而,对于跨越多个外显子的探针,这种效率会严重降低,这可能是因为内含子序列阻碍了探针与外显子的杂交。由于缺乏参考序列,这些区域在探针设计过程中无法完全避免。为了提高通量并降低序列捕获的成本,开发了一种可以同时分析多达 8 个样本的方法。序列数据表明,24 个单倍体样本之间的多重捕获具有可重复性,并且可以用于大群体中靶向基因的高通量分析。对捕获的序列进行了从头组装,产生了 11396 个扩展和注释的基因模型,这显著提高了我们对松属基因空间的认识。还评估了种间捕获,从火炬松设计的 98%以上的探针都可以有效地进行序列捕获,也适合于相关物种湿地松(Pinus elliottii Engelm.)的分析。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验