Tea Research Institute, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
BMC Plant Biol. 2024 May 8;24(1):373. doi: 10.1186/s12870-024-05086-5.
As one of the world's most important beverage crops, tea plants (Camellia sinensis) are renowned for their unique flavors and numerous beneficial secondary metabolites, attracting researchers to investigate the formation of tea quality. With the increasing availability of transcriptome data on tea plants in public databases, conducting large-scale co-expression analyses has become feasible to meet the demand for functional characterization of tea plant genes. However, as the multidimensional noise increases, larger-scale co-expression analyses are not always effective. Analyzing a subset of samples generated by effectively downsampling and reorganizing the global sample set often leads to more accurate results in co-expression analysis. Meanwhile, global-based co-expression analyses are more likely to overlook condition-specific gene interactions, which may be more important and worthy of exploration and research.
Here, we employed the k-means clustering method to organize and classify the global samples of tea plants, resulting in clustered samples. Metadata annotations were then performed on these clustered samples to determine the "conditions" represented by each cluster. Subsequently, we conducted gene co-expression network analysis (WGCNA) separately on the global samples and the clustered samples, resulting in global modules and cluster-specific modules. Comparative analyses of global modules and cluster-specific modules have demonstrated that cluster-specific modules exhibit higher accuracy in co-expression analysis. To measure the degree of condition specificity of genes within condition-specific clusters, we introduced the correlation difference value (CDV). By incorporating the CDV into co-expression analyses, we can assess the condition specificity of genes. This approach proved instrumental in identifying a series of high CDV transcription factor encoding genes upregulated during sustained cold treatment in Camellia sinensis leaves and buds, and pinpointing a pair of genes that participate in the antioxidant defense system of tea plants under sustained cold stress.
To summarize, downsampling and reorganizing the sample set improved the accuracy of co-expression analysis. Cluster-specific modules were more accurate in capturing condition-specific gene interactions. The introduction of CDV allowed for the assessment of condition specificity in gene co-expression analyses. Using this approach, we identified a series of high CDV transcription factor encoding genes related to sustained cold stress in Camellia sinensis. This study highlights the importance of considering condition specificity in co-expression analysis and provides insights into the regulation of the cold stress in Camellia sinensis.
作为世界上最重要的饮料作物之一,茶树(Camellia sinensis)以其独特的风味和众多有益的次生代谢产物而闻名,吸引了研究人员对茶叶品质形成进行研究。随着公共数据库中茶树转录组数据的不断增加,进行大规模的共表达分析已成为可能,以满足对茶树基因功能特征进行描述的需求。然而,随着多维噪声的增加,较大规模的共表达分析并不总是有效。对通过有效下采样和重新组织全局样本集生成的样本子集进行分析,通常会导致在共表达分析中得到更准确的结果。同时,基于全局的共表达分析更有可能忽略特定于条件的基因相互作用,这些相互作用可能更为重要,值得探索和研究。
在这里,我们采用 k-means 聚类方法对茶树的全局样本进行组织和分类,得到聚类样本。然后对这些聚类样本进行元数据注释,以确定每个聚类所代表的“条件”。随后,我们分别对全局样本和聚类样本进行基因共表达网络分析(WGCNA),得到全局模块和聚类特定模块。对全局模块和聚类特定模块进行比较分析表明,聚类特定模块在共表达分析中具有更高的准确性。为了衡量条件特异性聚类中基因的条件特异性程度,我们引入了相关差异值(CDV)。通过将 CDV 纳入共表达分析中,我们可以评估基因的条件特异性。这种方法在鉴定一系列在茶树叶片和芽中受持续低温处理上调的高 CDV 转录因子编码基因方面发挥了重要作用,并确定了一对参与茶树在持续冷胁迫下抗氧化防御系统的基因。
总之,对样本集进行下采样和重新组织提高了共表达分析的准确性。聚类特定模块更准确地捕捉特定于条件的基因相互作用。CDV 的引入允许对基因共表达分析中的条件特异性进行评估。使用这种方法,我们鉴定了一系列与茶树持续低温胁迫相关的高 CDV 转录因子编码基因。本研究强调了在共表达分析中考虑条件特异性的重要性,并为研究茶树的冷胁迫调控机制提供了新的思路。