Suppr超能文献

用于分析表达序列标签中微卫星的第二代框架,以及为针叶树日本柳杉开发 EST-SSR 标记。

A second generation framework for the analysis of microsatellites in expressed sequence tags and the development of EST-SSR markers for a conifer, Cryptomeria japonica.

机构信息

Department of Forest Genetics, Forestry and Forest Products Research Institute, 1 Matsunosato, Tsukuba, Ibaraki, 305-8687, Japan.

出版信息

BMC Genomics. 2012 Apr 16;13:136. doi: 10.1186/1471-2164-13-136.

Abstract

BACKGROUND

Microsatellites or simple sequence repeats (SSRs) in expressed sequence tags (ESTs) are useful resources for genome analysis because of their abundance, functionality and polymorphism. The advent of commercial second generation sequencing machines has lead to new strategies for developing EST-SSR markers, necessitating the development of bioinformatic framework that can keep pace with the increasing quality and quantity of sequence data produced. We describe an open scheme for analyzing ESTs and developing EST-SSR markers from reads collected by Sanger sequencing and pyrosequencing of sugi (Cryptomeria japonica).

RESULTS

We collected 141,097 sequence reads by Sanger sequencing and 1,333,444 by pyrosequencing. After trimming contaminant and low quality sequences, 118,319 Sanger and 1,201,150 pyrosequencing reads were passed to the MIRA assembler, generating 81,284 contigs that were analysed for SSRs. 4,059 SSRs were found in 3,694 (4.54%) contigs, giving an SSR frequency lower than that in seven other plant species with gene indices (5.4-21.9%). The average GC content of the SSR-containing contigs was 41.55%, compared to 40.23% for all contigs. Tri-SSRs were the most common SSRs; the most common motif was AT, which was found in 655 (46.3%) di-SSRs, followed by the AAG motif, found in 342 (25.9%) tri-SSRs. Most (72.8%) tri-SSRs were in coding regions, but 55.6% of the di-SSRs were in non-coding regions; the AT motif was most abundant in 3' untranslated regions. Gene ontology (GO) annotations showed that six GO terms were significantly overrepresented within SSR-containing contigs. Forty-four EST-SSR markers were developed from 192 primer pairs using two pipelines: read2Marker and the newly-developed CMiB, which combines several open tools. Markers resulting from both pipelines showed no differences in PCR success rate and polymorphisms, but PCR success and polymorphism were significantly affected by the expected PCR product size and number of SSR repeats, respectively. EST-SSR markers exhibited less polymorphism than genomic SSRs.

CONCLUSIONS

We have created a new open pipeline for developing EST-SSR markers and applied it in a comprehensive analysis of EST-SSRs and EST-SSR markers in C. japonica. The results will be useful in genomic analyses of conifers and other non-model species.

摘要

背景

微卫星或简单序列重复(SSR)在表达序列标签(EST)中是基因组分析的有用资源,因为它们的丰富度、功能和多态性。商业第二代测序仪器的出现带来了开发 EST-SSR 标记的新策略,这需要开发能够跟上不断增加的测序数据质量和数量的生物信息学框架。我们描述了一种用于分析 Sanger 测序和焦磷酸测序收集的日本柳杉(Cryptomeria japonica)EST 并开发 EST-SSR 标记的开放方案。

结果

我们通过 Sanger 测序收集了 141097 条序列,通过焦磷酸测序收集了 1333444 条序列。在去除污染物和低质量序列后,有 118319 条 Sanger 和 1201150 条焦磷酸测序的reads 被传递到 MIRA 组装器,生成了 81284 个 contigs,用于分析 SSR。在 3694 个(4.54%)contigs 中发现了 4059 个 SSR,SSR 频率低于其他七个具有基因指数的植物物种(5.4-21.9%)。含 SSR 的 contigs 的平均 GC 含量为 41.55%,而所有 contigs 的平均 GC 含量为 40.23%。三 SSR 是最常见的 SSR;最常见的基序是 AT,在 655 个(46.3%)二 SSR 中发现,其次是 AAG 基序,在 342 个(25.9%)三 SSR 中发现。大多数(72.8%)三 SSR 位于编码区,但 55.6%的二 SSR 位于非编码区;AT 基序在 3'非翻译区最为丰富。GO 注释显示,在含 SSR 的 contigs 中有六个 GO 术语显著富集。从 192 对引物中开发了 44 个 EST-SSR 标记,使用了两个管道:read2Marker 和新开发的 CMiB,它结合了几个开放工具。两种方法得到的标记在 PCR 成功率和多态性方面没有差异,但 PCR 成功率和多态性分别受到预期 PCR 产物大小和 SSR 重复数的显著影响。EST-SSR 标记的多态性低于基因组 SSR。

结论

我们创建了一个新的开放管道来开发 EST-SSR 标记,并将其应用于日本柳杉 EST-SSR 和 EST-SSR 标记的综合分析中。结果将有助于针叶树和其他非模式物种的基因组分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/3424129/16ca0c9ffa03/1471-2164-13-136-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验