Max-Delbrück-Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Robert Rössle Strasse 10, Berlin, Germany.
Genome Res. 2011 Jul;21(7):1193-200. doi: 10.1101/gr.113779.110. Epub 2011 May 2.
Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5' and 3' ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%-13%, 0%-3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database.
淡水涡虫是一种非常有吸引力的干细胞生物学、组织稳态和再生模型系统。扁形动物门的模式生物秀丽隐杆线虫的基因组最近已被测序,估计包含>20000 个编码蛋白的基因。然而,其转录组的特征远未完全确定。此外,整个扁形动物门的蛋白质组尚未在全基因组水平上进行测定。我们设计了一种有效的测序策略,使我们能够从头组装秀丽隐杆线虫转录组的主要部分。然后,我们使用独立的检测和大规模的鸟枪法蛋白质组学来验证转录本的真实性。总共,我们的从头组装产生了 18619 个候选转录本,在过滤后平均长度为 1118 个核苷酸。总共 17564 个候选转录本可以映射到当前基因组参考序列上的 15284 个独特基因座。RACE 证实了 24 个转录本中的 22 个转录本的 5'和 3'端完整或几乎完整。在组装的转录本中,移框、融合和分裂事件的频率分别计算为 4.2%-13%、0%-3.7%和 2.6%。我们的鸟枪法蛋白质组学产生了 16135 个独特的肽,验证了 4200 个转录本(FDR≤1%)。在这项研究中组装的转录本目录,连同鉴定的肽,极大地扩展和细化了涡虫基因注释,通过验证几个具有干细胞依赖性表达模式的先前未知的转录本得到了证明。此外,我们稳健的转录组特征分析管道可以应用于没有基因组组装的其他生物体。我们所有的数据,包括同源性注释,都可以在 SmedGD 上免费获得,这是一个秀丽隐杆线虫基因组数据库。