Suppr超能文献

棉花(陆地棉)的长读长转录组组装及种内单核苷酸多态性发现

A Long-Read Transcriptome Assembly of Cotton (Gossypium hirsutum L.) and Intraspecific Single Nucleotide Polymorphism Discovery.

作者信息

Ashrafi Hamid, Hulse-Kemp Amanda M, Wang Fei, Yang S Samuel, Guan Xueying, Jones Don C, Matvienko Marta, Mockaitis Keithanne, Chen Z Jeffrey, Stelly David M, Van Deynze Allen

机构信息

Univ. of California-Davis, Dep. of Plant Sciences and Seed Biotechnology Center, One Shields Ave., Davis, CA, 95616.

Texas A&M Univ., Dep. of Soil and Crop Sciences, College Station, TX, 77843.

出版信息

Plant Genome. 2015 Jul;8(2):eplantgenome2014.10.0068. doi: 10.3835/plantgenome2014.10.0068.

Abstract

Upland cotton (Gossypium hirsutum L.) has a narrow germplasm base, which constrains marker development and hampers intraspecific breeding. A pressing need exists for high-throughput single nucleotide polymorphism (SNP) markers that can be readily applied to germplasm in breeding and breeding-related research programs. Despite progress made in developing new sequencing technologies during the past decade, the cost of sequencing remains substantial when one is dealing with numerous samples and large genomes. Several strategies have been proposed to lower the cost of sequencing for multiple genotypes of large-genome species like cotton, such as transcriptome sequencing and reduced-representation DNA sequencing. This paper reports the development of a transcriptome assembly of the inbred line Texas Marker-1 (TM-1), a genetic standard for cotton, its usefulness as a reference for RNA sequencing (RNA-seq)-based SNP identification, and the availability of transcriptome sequences of four other cotton cultivars. An assembly of TM-1 was made using Roche 454 transcriptome reads combined with an assembly of all available public expressed sequence tag (EST) sequences of TM-1. The TM-1 assembly consists of 72,450 contigs with a total of 70 million bp. Functional predictions of the transcripts were estimated by alignment to selected protein databases. Transcriptome sequences of the five lines, including TM-1, were obtained using an Illumina Genome Analyzer-II, and the short reads were mapped to the TM-1 assembly to discover SNPs among the five lines. We identified >14,000 unfiltered allelic SNPs, of which ∼3,700 SNPs were retained for assay development after applying several rigorous filters. This paper reports availability of the reference transcriptome assembly and shows its utility in developing intraspecific SNP markers in upland cotton.

摘要

陆地棉(Gossypium hirsutum L.)的种质基础狭窄,这限制了标记开发并阻碍了种内育种。迫切需要高通量单核苷酸多态性(SNP)标记,以便能够在育种及与育种相关的研究项目中轻松应用于种质。尽管在过去十年中开发新测序技术取得了进展,但在处理大量样本和大基因组时,测序成本仍然很高。已经提出了几种策略来降低棉花等大基因组物种多种基因型的测序成本,例如转录组测序和简化代表性DNA测序。本文报道了棉花遗传标准自交系德州标记-1(TM-1)转录组组装的开发、其作为基于RNA测序(RNA-seq)的SNP鉴定参考的实用性,以及其他四个棉花品种转录组序列的可用性。使用罗氏454转录组读数结合TM-1所有可用的公共表达序列标签(EST)序列组装了TM-1。TM-1组装由72,450个重叠群组成,总共有7000万个碱基对。通过与选定的蛋白质数据库比对来估计转录本的功能预测。使用Illumina Genome Analyzer-II获得了包括TM-1在内的五个品系的转录组序列,并将短读段映射到TM-1组装上以发现这五个品系之间的SNP。我们鉴定出>14,000个未过滤的等位SNP,在应用了几个严格的过滤条件后,其中约3,700个SNP被保留用于分析开发。本文报道了参考转录组组装的可用性,并展示了其在陆地棉种内SNP标记开发中的实用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验