Toubiana David, Puzis Rami, Sadka Avi, Blumwald Eduardo
Department of Plant Sciences, University of California, Davis, Davis, California.
Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel.
J Comput Biol. 2019 Dec;26(12):1349-1366. doi: 10.1089/cmb.2019.0221. Epub 2019 Jul 30.
Weighted gene co-expression network analysis (WGCNA) is a widely used software tool that is used to establish relationships between phenotypic traits and gene expression data. It generates gene modules and then correlates their first principal component to phenotypic traits, proposing a functional relationship expressed by the correlation coefficient. However, gene modules often contain thousands of genes of different functional backgrounds. Here, we developed a stochastic optimization algorithm, known as genetic algorithm (GA), optimizing the trait to gene module relationship by gradually increasing the correlation between the trait and a subset of genes of the gene module. We exemplified the GA on a Japanese plum hormone profile and an RNA-seq dataset. The correlation between the subset of module genes and the trait increased, whereas the number of correlated genes became sufficiently small, allowing for their individual assessment. Gene ontology (GO) term enrichment analysis of the gene sets identified by the GA showed an increase in specificity of the GO terms associated with fruit hormone balance as compared with the GO enrichment analysis of the gene modules generated by WGCNA and other methods.
加权基因共表达网络分析(WGCNA)是一种广泛使用的软件工具,用于建立表型性状与基因表达数据之间的关系。它生成基因模块,然后将其第一主成分与表型性状相关联,提出由相关系数表示的功能关系。然而,基因模块通常包含数千个具有不同功能背景的基因。在此,我们开发了一种随机优化算法,即遗传算法(GA),通过逐步增加性状与基因模块中基因子集之间的相关性来优化性状与基因模块的关系。我们在日本李的激素谱和RNA测序数据集上对GA进行了例证。模块基因子集与性状之间的相关性增加,而相关基因的数量变得足够少,以便对其进行个体评估。与WGCNA和其他方法生成的基因模块的基因本体(GO)术语富集分析相比,GA鉴定的基因集的GO术语富集分析显示,与果实激素平衡相关的GO术语的特异性有所增加。