Center for Functional Genomics and Bioinformatics, TransDisciplinary University, Institute of Trans-Disciplinary Health Sciences and Technology, Bengaluru 560064, India.
Center for Cellular and Molecular Platforms, National Centre for Biological Sciences, Bengaluru 560065, India.
Plant Physiol. 2018 Apr;176(4):2772-2788. doi: 10.1104/pp.17.01764. Epub 2018 Feb 12.
Indian sandalwood () is an important tropical evergreen tree known for its fragrant heartwood-derived essential oil and its valuable carving wood. Here, we applied an integrated genomic, transcriptomic, and proteomic approach to assemble and annotate the Indian sandalwood genome. Our genome sequencing resulted in the establishment of a draft map of the smallest genome for any woody tree species to date (221 Mb). The genome annotation predicted 38,119 protein-coding genes and 27.42% repetitive DNA elements. In-depth proteome analysis revealed the identities of 72,325 unique peptides, which confirmed 10,076 of the predicted genes. The addition of transcriptomic and proteogenomic approaches resulted in the identification of 53 novel proteins and 34 gene-correction events that were missed by genomic approaches. Proteogenomic analysis also helped in reassigning 1,348 potential noncoding RNAs as bona fide protein-coding messenger RNAs. Gene expression patterns at the RNA and protein levels indicated that peptide sequencing was useful in capturing proteins encoded by nuclear and organellar genomes alike. Mass spectrometry-based proteomic evidence provided an unbiased approach toward the identification of proteins encoded by organellar genomes. Such proteins are often missed in transcriptome data sets due to the enrichment of only messenger RNAs that contain poly(A) tails. Overall, the use of integrated omic approaches enhanced the quality of the assembly and annotation of this nonmodel plant genome. The availability of genomic, transcriptomic, and proteomic data will enhance genomics-assisted breeding, germplasm characterization, and conservation of sandalwood trees.
檀香()是一种重要的热带常绿乔木,以其芳香的心材衍生的精油和有价值的雕刻木材而闻名。在这里,我们采用了综合基因组、转录组和蛋白质组学方法来组装和注释檀香基因组。我们的基因组测序结果建立了迄今为止最小的木本树种基因组草图(221Mb)。基因组注释预测了 38119 个编码蛋白质的基因和 27.42%的重复 DNA 元件。深入的蛋白质组分析揭示了 72325 个独特肽的身份,证实了预测基因中的 10076 个。转录组和蛋白质基因组学方法的加入导致鉴定出 53 种新蛋白质和 34 种基因校正事件,这些事件是基因组方法错过的。蛋白质基因组学分析还有助于将 1348 种潜在的非编码 RNA 重新分配为真正的编码蛋白质的信使 RNA。RNA 和蛋白质水平的基因表达模式表明,肽测序在捕获核基因组和细胞器基因组编码的蛋白质方面非常有用。基于质谱的蛋白质组学证据提供了一种识别细胞器基因组编码蛋白的无偏方法。由于仅富含含有 poly(A) 尾巴的信使 RNA,这些蛋白质通常在转录组数据集丢失。总体而言,综合组学方法的使用提高了这个非模式植物基因组组装和注释的质量。基因组、转录组和蛋白质组数据的可用性将增强檀香树的基因组辅助育种、种质特性描述和保护。