Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
Biostatistics Shared Facility, JG Brown Cancer Center and Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, 40202, KY, USA.
Sci Rep. 2018 Feb 5;8(1):2391. doi: 10.1038/s41598-018-19736-w.
The analysis of gene sets is usually carried out based on gene ontology terms and known biological pathways. These approaches may not establish any formal relation between genotype and trait specific phenotype. In plant biology and breeding, analysis of gene sets with trait specific Quantitative Trait Loci (QTL) data are considered as great source for biological knowledge discovery. Therefore, we proposed an innovative statistical approach called Gene Set Analysis with QTLs (GSAQ) for interpreting gene expression data in context of gene sets with traits. The utility of GSAQ was studied on five different complex abiotic and biotic stress scenarios in rice, which yields specific trait/stress enriched gene sets. Further, the GSAQ approach was more innovative and effective in performing gene set analysis with underlying QTLs and identifying QTL candidate genes than the existing approach. The GSAQ approach also provided two potential biological relevant criteria for performance analysis of gene selection methods. Based on this proposed approach, an R package, i.e., GSAQ ( https://cran.r-project.org/web/packages/GSAQ ) has been developed. The GSAQ approach provides a valuable platform for integrating the gene expression data with genetically rich QTL data.
基因集分析通常基于基因本体论术语和已知的生物途径进行。这些方法可能无法在基因型和表型之间建立任何正式的关系。在植物生物学和育种中,分析具有特定表型性状的数量性状基因座 (QTL) 数据被认为是生物知识发现的重要来源。因此,我们提出了一种创新的统计方法,称为带有 QTL 的基因集分析 (GSAQ),用于根据基因集和性状解释基因表达数据。在水稻的五个不同的复杂非生物和生物胁迫场景中研究了 GSAQ 的实用性,这些场景产生了特定的性状/胁迫富集基因集。此外,与现有方法相比,GSAQ 方法在使用潜在的 QTL 进行基因集分析和识别 QTL 候选基因方面更具创新性和有效性。GSAQ 方法还为基因选择方法的性能分析提供了两个潜在的生物学相关标准。基于这种方法,开发了一个 R 包,即 GSAQ(https://cran.r-project.org/web/packages/GSAQ)。GSAQ 方法为将基因表达数据与遗传丰富的 QTL 数据集成提供了一个有价值的平台。