• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对非单调关联和多个实验类别的基因集富集分析。

Gene set enrichment analysis for non-monotone association and multiple experimental categories.

作者信息

Lin Rongheng, Dai Shuangshuang, Irwin Richard D, Heinloth Alexandra N, Boorman Gary A, Li Leping

机构信息

Biostatistics Branch, National Institute of Environmental Health Science, Research Triangle Park, NC 27713, USA.

出版信息

BMC Bioinformatics. 2008 Nov 14;9:481. doi: 10.1186/1471-2105-9-481.

DOI:10.1186/1471-2105-9-481
PMID:19014579
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2636811/
Abstract

BACKGROUND

Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed.

RESULTS

For continuous or ordinal phenotypic outcome, we propose to use as the local statistic the coefficient of multiple determination (i.e., the square of multiple correlation coefficient) R2 from fitting natural cubic spline models to the phenotype-expression relationship. Next, we incorporate this association measure into the GSEA/SAFE framework to identify significant gene sets. Unsigned local statistics, signed global statistics and one-sided p-values are used to reflect our inferential interest. Furthermore, we describe a procedure for inference across multiple GSEA/SAFE analyses. We illustrate our approach using gene expression and liver injury data from liver and blood samples from rats treated with eight hepatotoxicants under multiple time and dose combinations. We set out to identify biological pathways/processes associated with liver injury as manifested by increased blood levels of alanine transaminase in common for most of the eight compounds. Potential statistical dependency resulting from the experimental design is addressed in permutation based hypothesis testing.

CONCLUSION

The proposed framework captures both linear and non-linear association between gene expression level and a phenotypic endpoint and thus can be viewed as extending the current GSEA/SAFE methodology. The framework for combining results from multiple GSEA/SAFE analyses is flexible to address practical inference interests. Our methods can be applied to microarray data with continuous phenotypes with multi-level design or the meta-analysis of multiple microarray data sets.

摘要

背景

最近,利用功能通路信息进行的微阵列数据分析,例如基因集富集分析(GSEA)和功能与表达显著性分析(SAFE),已被认可为一种识别与表型终点相关的生物通路/过程的方法。在这些分析中,使用局部统计量来评估基因表达水平与表型终点值之间的关联。然后,将这些基因特异性的局部统计量组合起来,以评估预先选择的基因集的关联。常用的局部统计量包括用于二元表型的t统计量以及假设连续表型与基因表达水平之间存在线性或单调关系的相关系数。需要适用于连续非单调关系的方法。此外,对于多个实验类别,需要能够结合多个GSEA/SAFE分析的方法。

结果

对于连续或有序的表型结果,我们建议使用通过将自然三次样条模型拟合到表型-表达关系而得到的多重决定系数(即多重相关系数的平方)R²作为局部统计量。接下来,我们将这种关联度量纳入GSEA/SAFE框架以识别显著的基因集。使用无符号局部统计量、有符号全局统计量和单侧p值来反映我们的推断兴趣。此外,我们描述了一种跨多个GSEA/SAFE分析进行推断的程序。我们使用来自用八种肝毒性剂在多个时间和剂量组合下处理的大鼠的肝脏和血液样本的基因表达和肝损伤数据来说明我们的方法。我们着手识别与肝损伤相关的生物通路/过程,这在八种化合物中的大多数中表现为丙氨酸转氨酶血液水平升高。实验设计导致的潜在统计依赖性在基于置换的假设检验中得到解决。

结论

所提出的框架捕捉了基因表达水平与表型终点之间的线性和非线性关联,因此可以被视为对当前GSEA/SAFE方法的扩展。结合多个GSEA/SAFE分析结果的框架灵活地解决了实际推断兴趣。我们的方法可应用于具有多水平设计的连续表型的微阵列数据或多个微阵列数据集的荟萃分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6525/2636811/8802b7f9199d/1471-2105-9-481-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6525/2636811/e36d159cc337/1471-2105-9-481-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6525/2636811/e1997fdfbe5f/1471-2105-9-481-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6525/2636811/8802b7f9199d/1471-2105-9-481-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6525/2636811/e36d159cc337/1471-2105-9-481-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6525/2636811/e1997fdfbe5f/1471-2105-9-481-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6525/2636811/8802b7f9199d/1471-2105-9-481-3.jpg

相似文献

1
Gene set enrichment analysis for non-monotone association and multiple experimental categories.针对非单调关联和多个实验类别的基因集富集分析。
BMC Bioinformatics. 2008 Nov 14;9:481. doi: 10.1186/1471-2105-9-481.
2
Improving gene set analysis of microarray data by SAM-GS.通过SAM-GS改进微阵列数据的基因集分析
BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.
3
Extensions to gene set enrichment.基因集富集的扩展
Bioinformatics. 2007 Feb 1;23(3):306-13. doi: 10.1093/bioinformatics/btl599. Epub 2006 Nov 24.
4
Gene set enrichment analysis made simple.基因集富集分析变得简单。
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.
5
Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.多组大规模两样本表达数据集的一致整合基因集富集分析。
BMC Genomics. 2014;15 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2164-15-S1-S6. Epub 2014 Jan 24.
6
Comparative study of gene set enrichment methods.基因集富集方法的比较研究。
BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.基于基因集富集分析的 clear cell 肾细胞癌基因表达分析用于生物统计学管理。
BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.
9
Multiple testing for gene sets from microarray experiments.基于基因芯片实验的基因集多重检验
BMC Bioinformatics. 2011 May 26;12:209. doi: 10.1186/1471-2105-12-209.
10
BubbleGUM: automatic extraction of phenotype molecular signatures and comprehensive visualization of multiple Gene Set Enrichment Analyses.BubbleGUM:表型分子特征的自动提取及多种基因集富集分析的综合可视化
BMC Genomics. 2015 Oct 19;16:814. doi: 10.1186/s12864-015-2012-4.

引用本文的文献

1
ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets.ClusterMine:一种基于基因集表达谱的知识整合聚类方法。
J Bioinform Comput Biol. 2020 Jun;18(3):2040009. doi: 10.1142/S0219720020400090.
2
Potential Genes and Pathways of Neonatal Sepsis Based on Functional Gene Set Enrichment Analyses.基于功能基因集富集分析的新生儿败血症潜在基因与通路
Comput Math Methods Med. 2018 Jul 30;2018:6708520. doi: 10.1155/2018/6708520. eCollection 2018.
3
Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data.

本文引用的文献

1
Blood gene expression signatures predict exposure levels.血液基因表达特征可预测暴露水平。
Proc Natl Acad Sci U S A. 2007 Nov 13;104(46):18211-6. doi: 10.1073/pnas.0706987104. Epub 2007 Nov 2.
2
Improving gene set analysis of microarray data by SAM-GS.通过SAM-GS改进微阵列数据的基因集分析
BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.
3
Significance analysis of groups of genes in expression profiling studies.表达谱研究中基因分组的显著性分析。
使用预定义的集合和子集信息分析高维数据及其在基因组数据中的应用。
BMC Bioinformatics. 2012 Jul 24;13:177. doi: 10.1186/1471-2105-13-177.
4
A microarray analysis of gnotobiotic mice indicating that microbial exposure during the neonatal period plays an essential role in immune system development.无菌动物小鼠的基因芯片分析表明,新生儿期接触微生物对于免疫系统发育至关重要。
BMC Genomics. 2012 Jul 23;13:335. doi: 10.1186/1471-2164-13-335.
5
Tumor and reproductive traits are linked by RNA metabolism genes in the mouse ovary: a transcriptome-phenotype association analysis.肿瘤和生殖特征通过 RNA 代谢基因在小鼠卵巢中联系起来:转录组-表型关联分析。
BMC Genomics. 2010 Dec 22;11 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2164-11-S5-S1.
6
Prioritizing genes associated with prostate cancer development.优先考虑与前列腺癌发展相关的基因。
BMC Cancer. 2010 Nov 2;10:599. doi: 10.1186/1471-2407-10-599.
7
A new procedure for determining the genetic basis of a physiological process in a non-model species, illustrated by cold induced angiogenesis in the carp.一种确定非模式物种生理过程遗传基础的新方法,以鲤鱼冷诱导血管生成为例进行说明。
BMC Genomics. 2009 Oct 23;10:490. doi: 10.1186/1471-2164-10-490.
8
A novel algorithm for detecting differentially regulated paths based on gene set enrichment analysis.基于基因集富集分析的差异调控路径检测新算法。
Bioinformatics. 2009 Nov 1;25(21):2787-94. doi: 10.1093/bioinformatics/btp510. Epub 2009 Aug 27.
9
Candidate pathways and genes for prostate cancer: a meta-analysis of gene expression data.前列腺癌的候选通路和基因:基因表达数据的荟萃分析
BMC Med Genomics. 2009 Aug 4;2:48. doi: 10.1186/1755-8794-2-48.
Bioinformatics. 2007 Aug 15;23(16):2104-12. doi: 10.1093/bioinformatics/btm310. Epub 2007 Jun 6.
4
Analyzing gene expression data in terms of gene sets: methodological issues.从基因集角度分析基因表达数据:方法学问题。
Bioinformatics. 2007 Apr 15;23(8):980-7. doi: 10.1093/bioinformatics/btm051. Epub 2007 Feb 15.
5
Extensions to gene set enrichment.基因集富集的扩展
Bioinformatics. 2007 Feb 1;23(3):306-13. doi: 10.1093/bioinformatics/btl599. Epub 2006 Nov 24.
6
Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.高通量基因组学中的富集分析——考虑空值中的依赖性。
Brief Bioinform. 2007 Mar;8(2):71-7. doi: 10.1093/bib/bbl019. Epub 2006 Oct 31.
7
Statistical analysis of DNA microarray data in cancer research.癌症研究中DNA微阵列数据的统计分析。
Clin Cancer Res. 2006 Aug 1;12(15):4469-73. doi: 10.1158/1078-0432.CCR-06-1033.
8
Analysis of sample set enrichment scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles.样本集富集分数分析:检测全基因组表达谱中单个样本的基因集富集情况。
Bioinformatics. 2006 Jul 15;22(14):e108-16. doi: 10.1093/bioinformatics/btl231.
9
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。
Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.
10
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.基因集富集分析:一种基于知识的方法用于解读全基因组表达谱。
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.