National Institute of Plant Genome Research, New Delhi, India.
Plant Biotechnol J. 2013 Sep;11(7):894-905. doi: 10.1111/pbi.12082. Epub 2013 Jun 13.
Genomic resources such as ESTs, molecular markers and linkage maps are essential for crop improvement. However, these resources are still limited in important legumes such as lentil (Lens culinaris Medik.), which is valued world wide as a rich source of dietary protein. In this study, the de novo transcriptome assembly of 119,855,798 short reads, generated by Illumina paired-end sequencing, was performed using various assembly programs. This resulted in 42,196 nonredundant high-quality transcripts of average length 810 bases, N50 value of 1,432 and an average expression per transcript of 26.21 rpkm reads per kilobase per million(RPKM). Similarity search with the unigenes and protein sequences of other plants resulted in maximum similarity with soybean. A total of 20,009 nonredundant transcripts showed similarity with the UniProtKB database and of these, 18,064 transcripts were grouped into three main GO categories, that is, biological process (15,126), molecular function (15,505) and cellular component (9,434). Annotated transcripts were mapped to 289 predicted Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and 8,893 transcripts were classified into 24 functional categories based on Cluster of Orthologous Groups (COG) of proteins. Mining the data set for the presence of SSRs resulted in 8,722 SSRs with a frequency occurrence of one SSR per 3.92 kb. From these, 5,673 SSR primer pairs were designed, and a subset of these were utilized for diversity analysis. This study, which provides a large data set of annotated transcripts and gene-based SSR markers, would serve as a foundation for various applications in lentil breeding and genetics.
基因组资源,如 ESTs、分子标记和连锁图谱,对于作物改良至关重要。然而,这些资源在重要的豆类作物中仍然有限,例如小扁豆(Lens culinaris Medik.),它是世界范围内一种丰富的膳食蛋白质来源。在这项研究中,通过 Illumina 配对末端测序生成的 119,855,798 条短读序列进行了从头转录组组装,使用了各种组装程序。这导致了 42,196 个非冗余的高质量转录本,平均长度为 810 个碱基,N50 值为 1,432,每个转录本的平均表达量为 26.21 RPKM(每百万读取每千碱基的 RPKM)。与其他植物的 unigenes 和蛋白质序列的相似性搜索结果与大豆的相似度最高。总共 20,009 个非冗余转录本与 UniProtKB 数据库具有相似性,其中 18,064 个转录本分为三个主要的 GO 类别,即生物过程(15,126)、分子功能(15,505)和细胞组成(9,434)。注释转录本被映射到 289 个预测的京都基因与基因组百科全书(KEGG)途径,8,893 个转录本根据同源基因簇(COG)的蛋白质被分类为 24 个功能类别。对数据集中 SSRs 的存在进行挖掘,得到了 8,722 个 SSR,平均每 3.92 kb 出现一个 SSR。其中,设计了 5,673 对 SSR 引物对,并利用其中的一部分进行了多样性分析。这项研究提供了大量注释转录本和基于基因的 SSR 标记数据集,将成为小扁豆育种和遗传学中各种应用的基础。