Wang Wenjiao, Shen Chengcheng, Wen Xinqiang, Li Anqi, Gao Qi, Xu Zhaoying, Wei Yuping, Li Yushun, Guan Dailu, Liu Bin
College of Horticulture, Shanxi Agricultural University, Jinzhong, 030801, China.
Hami-melon Research Center, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, China.
BMC Genomics. 2025 Jan 10;26(1):25. doi: 10.1186/s12864-025-11212-w.
Identification of global transcriptional events is crucial for genome annotation, as accurate annotation enhances the efficiency and comparability of genomic information across species. However, the annotation of transcripts in the cucumber genome remains to be improved, and many transcriptional events have not been well studied.
We collected 1,904 high-quality public cucumber transcriptome samples from the National Center for Biotechnology Information (NCBI) to identify and annotate transcript isoforms in the cucumber genome. Over 44.26 billion Q30 clean reads were mapped to the cucumber genome with an average mapping rate of 92.75%. Transcriptome assembly identified 151,453 transcripts spanning 20,442 loci. Among these, 12.7% of transcripts exactly matched annotated genes in the cucumber reference genome. More than 80% of the transcripts were classified as novel isoforms. Approximately 96.6% of these isoforms originated from known gene loci, while around 3.3% were derived from novel gene loci. Coding potential prediction identified 4,543 long non-coding RNAs (lncRNAs) across 3,376 loci. Building on these results, we identified tissue-specific transcripts in 10 tissues. Among that, 1,655 annotated genes and 4,214 predicted transcripts were considered as tissue-specific. The root exhibited the highest number of tissue-specific transcripts, followed by shoot apex. Subsequent selective pressure analysis revealed that tissue-specific regions experienced stronger directional selection compared to non-specific regions.
By analyzing thousands of published transcriptome data, we identified abundant transcriptional events and tissue-specific transcripts in cucumbers. This study presented here adds the great value to the public data and offers insights for further exploration of a more comprehensive tissue regulatory network in cucumber.
识别全局转录事件对于基因组注释至关重要,因为准确的注释可提高跨物种基因组信息的效率和可比性。然而,黄瓜基因组中转录本的注释仍有待改进,许多转录事件尚未得到充分研究。
我们从美国国立生物技术信息中心(NCBI)收集了1904个高质量的公共黄瓜转录组样本,以识别和注释黄瓜基因组中的转录本异构体。超过442.6亿条Q30清洁读段被映射到黄瓜基因组,平均映射率为92.75%。转录组组装鉴定出跨越20442个基因座的151453个转录本。其中,12.7%的转录本与黄瓜参考基因组中注释的基因完全匹配。超过80%的转录本被分类为新的异构体。这些异构体中约96.6%起源于已知基因座,约3.3%来自新的基因座。编码潜力预测在3376个基因座上鉴定出4543个长链非编码RNA(lncRNA)。基于这些结果,我们在10个组织中鉴定出组织特异性转录本。其中,1655个注释基因和4214个预测转录本被认为是组织特异性的。根中组织特异性转录本数量最多,其次是茎尖。随后的选择压力分析表明,与非特异性区域相比,组织特异性区域经历了更强的定向选择。
通过分析数千个已发表的转录组数据,我们在黄瓜中鉴定出了丰富的转录事件和组织特异性转录本。本研究为公共数据增添了巨大价值,并为进一步探索黄瓜中更全面的组织调控网络提供了见解。