Suppr超能文献

从头组装辣椒转录组(Capsicum annuum):用于 SNP、SSR 和候选基因在计算机上发现的基准。

De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes.

机构信息

Seed Biotechnology Center, University of California, Davis, 1 Shields Ave, Davis, CA 95616, USA.

出版信息

BMC Genomics. 2012 Oct 30;13:571. doi: 10.1186/1471-2164-13-571.

Abstract

BACKGROUND

Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes.

RESULTS

Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80-120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins.

CONCLUSIONS

Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.

摘要

背景

通过开发与育种种质转录组相关的 DNA 标记,可加速辣椒(Capsicum spp.)的分子育种。在下一代测序(NGS)技术出现之前,大多数测序数据都是通过桑格测序方法产生的。利用桑格 EST 数据,我们为辣椒生成了大量的遗传信息,包括数千个 SNP 和单一位点多态性(SPP)标记。为了补充和增强这些资源,我们应用 NGS 对三种辣椒基因型:Maor、Early Jalapeño 和 Criollo de Morelos-334(CM334)进行了分析,以鉴定这三种基因型组装中的 SNP 和 SSR。

结果

我们应用 NGS 分别对三种辣椒基因型进行了分析,为三种基因型组装鉴定了 SNP 和 SSR。为了开发不同用途的两个辣椒转录组组装,我们首先利用 CAP3 软件组装了第一个参考序列,该序列由来自韩国 F1 杂种系 Bukang 的 >125,000 个桑格-EST 序列的 31,196 个连续体组成。然后我们设计了重叠探针,用于构建用于全基因组分析的辣椒 Affymetrix GeneChip®微阵列,该微阵列基于 30,815 个基因的表达谱。此外,我们还使用自定义的 Python 脚本在组装的连续体中识别了 4,236 个 SNP。从组装中总共鉴定出了 2,489 个简单序列重复(SSR),并设计了 SSR 的引物。使用 Blast2GO 软件对连续体进行注释,得到了组装中 60%的基因的信息。第二个转录组组装是从超过 2 亿个 Illumina 基因组分析仪 II 读数(80-120nt)中构建的,使用了 Velvet、CLC workbench 和 CAP3 软件包的组合。我们使用 BWA、SAMtools 和内部 Perl 脚本在三个辣椒基因型之间识别 SNP。然后对 SNP 进行过滤,使其与任何内含子-外显子交界处以及侧翼 SNP 的距离至少为 50bp。鉴定出了 22,000 多个高质量的假定 SNP。使用 MISA 软件,还在 Illumina 转录组组装中鉴定出了 10,398 个 SSR 标记,并为鉴定出的标记设计了引物。通过 Blast2GO 对组装进行了注释,其中 14,740(12%)个注释的连续体与功能蛋白相关。

结论

在获得辣椒基因组序列之前,需要组装经济上重要作物的转录组,以生成数千个可用于育种计划的高质量分子标记。为了更好地理解组装序列并鉴定 QTL 背后的候选基因,我们对桑格 EST 和 Illumina 转录组组装的连续体进行了注释。这些和其他信息已在我们专门为辣椒项目创建的数据库中进行了整理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ef7/3545863/f66ad74fd914/1471-2164-13-571-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验