Chopra Ratan, Burow Gloria, Farmer Andrew, Mudge Joann, Simpson Charles E, Burow Mark D
Texas Tech University, Department of Plant and Soil Sciences, Lubbock, TX, 79409, United States of America.
USDA-ARS-CSRL, 3810 4th Street, Lubbock, TX, 79415, United States of America.
PLoS One. 2014 Dec 31;9(12):e115055. doi: 10.1371/journal.pone.0115055. eCollection 2014.
The narrow genetic base and limited genetic information on Arachis species have hindered the process of marker-assisted selection of peanut cultivars. However, recent developments in sequencing technologies have expanded opportunities to exploit genetic resources, and at lower cost. To use the genetic information for Arachis species available at the transcriptome level, it is important to have a good quality reference transcriptome. The available Tifrunner 454 FLEX transcriptome sequences have an assembly with 37,000 contigs and low N50 values of 500-751 bp. Therefore, we generated de novo transcriptome assemblies, with about 38 million reads in the tetraploid cultivar OLin, and 16 million reads in each of the diploids, A. duranensis K38901 and A. ipaënsis KGBSPSc30076 using three different de novo assemblers, Trinity, SOAPdenovo-Trans and TransAByss. All these assemblers can use single kmer analysis, and the latter two also permit multiple kmer analysis. Assemblies generated for all three samples had N50 values ranging from 1278-1641 bp in Arachis hypogaea (AABB), 1401-1492 bp in Arachis duranensis (AA), and 1107-1342 bp in Arachis ipaënsis (BB). Comparison with legume ESTs and protein databases suggests that assemblies generated had more than 40% full length transcripts with good continuity. Also, on mapping the raw reads to each of the assemblies generated, Trinity had a high success rate in assembling sequences compared to both TransAByss and SOAPdenovo-Trans. De novo assembly of OLin had a greater number of contigs (67,098) and longer contig length (N50 = 1,641) compared to the Tifrunner TSA. Despite having shorter read length (2 × 50) than the Tifrunner 454FLEX TSA, de novo assembly of OLin proved superior in comparison. Assemblies generated to represent different genome combinations may serve as a valuable resource for the peanut research community.
花生属物种狭窄的遗传基础和有限的遗传信息阻碍了花生品种的标记辅助选择进程。然而,测序技术的最新发展为利用遗传资源提供了更多机会,且成本更低。为了利用转录组水平上花生属物种的遗传信息,拥有高质量的参考转录组至关重要。现有的Tifrunner 454 FLEX转录组序列组装得到37000个重叠群,N50值较低,为500 - 751 bp。因此,我们使用三种不同的从头组装软件Trinity、SOAPdenovo-Trans和TransAByss,对四倍体品种OLin进行了从头转录组建库,得到约3800万条 reads,对二倍体A. duranensis K38901和A. ipaënsis KGBSPSc30076分别得到1600万条 reads。所有这些组装软件都可以进行单kmer分析,后两者还允许进行多kmer分析。为所有三个样本生成的组装结果中,花生(AABB)的N50值在1278 - 1641 bp之间,Arachis duranensis(AA)的N50值在1401 - 1492 bp之间,Arachis ipaënsis(BB)的N50值在1107 - 1342 bp之间。与豆科EST和蛋白质数据库的比较表明,生成的组装结果中有超过40%的全长转录本具有良好的连续性。此外,将原始 reads 映射到每个生成的组装结果上时,与TransAByss和SOAPdenovo-Trans相比,Trinity在序列组装方面成功率较高。与Tifrunner TSA相比,OLin的从头组装得到的重叠群数量更多(67098个),重叠群长度更长(N50 = 1641)。尽管OLin的读长(2×50)比Tifrunner 454FLEX TSA短,但相比之下,OLin的从头组装结果更优。为代表不同基因组组合而生成的组装结果可能会成为花生研究群体的宝贵资源。