• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

建立 p 值分布模型以改进癌症转录组数据集主题驱动的生存分析。

Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets.

机构信息

School of Medicine, Cardiff University, Heath Park, Cardiff CF144XN, UK.

出版信息

BMC Bioinformatics. 2010 Jan 11;11:19. doi: 10.1186/1471-2105-11-19.

DOI:10.1186/1471-2105-11-19
PMID:20064243
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2824674/
Abstract

BACKGROUND

Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis.

RESULTS

To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer.

CONCLUSIONS

Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets.

摘要

背景

以主题为导向的癌症生存研究旨在探讨与生物学过程相关的基因表达特征是否可以预测患者的生存时间。虽然这在理想情况下应通过检验两个独立的零假设来实现,但目前的方法将这两个假设视为一个整体。第一个检验应评估基因集(独立于其组成)是否与预后相关(通常通过生存检验完成)。然后,第二个检验验证基因集的主题是否相关(通常通过与随机基因集进行比较的经验检验来完成)。目前的方法并未检验第二个零假设,因为人们假设随机基因集的 p 值分布(当针对第一个零假设进行检验时)是均匀的。在这里,我们证明这种假设通常是不正确的,因此,这些方法可能会错误地将特定基因集的生物学与癌症预后联系起来。

结果

为了评估此类研究中随机基因集非均匀分布的影响,开发了一种自动化的主题驱动方法。该方法基于随机排列方法,对不相关基因集的 p 值分布进行经验近似,并检验与生存相关的预定义生物学相关基因集是否相关。与已发表的主题驱动方法的比较结果表明存在非均匀分布,这表明原始研究中存在显著的假阳性率问题。当应用于两个公共癌症数据集时,我们的技术揭示了具有预后能力的新的本体论类别,包括乳腺癌中“脂肪酸代谢”与总生存期之间的显著相关性,以及肺癌中“受体介导的内吞作用”、“脑发育”、“顶端质膜”和“MAPK 信号通路”与总生存期之间的显著相关性。

结论

目前的主题驱动生存研究方法假设随机基因集的 p 值分布是均匀的,这可能导致错误的结论。我们的方法提供了一种纠正这种缺陷的方法,并为在临床微阵列数据集中识别具有预后能力的更高层次生物学主题和途径提供了新途径。

相似文献

1
Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets.建立 p 值分布模型以改进癌症转录组数据集主题驱动的生存分析。
BMC Bioinformatics. 2010 Jan 11;11:19. doi: 10.1186/1471-2105-11-19.
2
Functional analysis: evaluation of response intensities--tailoring ANOVA for lists of expression subsets.功能分析:反应强度评估——定制表达子集列表的方差分析。
BMC Bioinformatics. 2010 Oct 13;11:510. doi: 10.1186/1471-2105-11-510.
3
Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。
BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.
4
Comparative evaluation of gene-set analysis methods.基因集分析方法的比较评估
BMC Bioinformatics. 2007 Nov 7;8:431. doi: 10.1186/1471-2105-8-431.
5
A simple but highly effective approach to evaluate the prognostic performance of gene expression signatures.一种简单而高效的方法来评估基因表达特征的预后性能。
PLoS One. 2011;6(12):e28320. doi: 10.1371/journal.pone.0028320. Epub 2011 Dec 7.
6
An Integrated Microarray Analysis Reveals Significant Diagnostic and Prognostic Biomarkers in Pancreatic Cancer.集成微阵列分析揭示胰腺癌有显著的诊断和预后生物标志物。
Med Sci Monit. 2020 Apr 1;26:e921769. doi: 10.12659/MSM.921769.
7
'N-of-1-pathways' unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicine.“单病例通路”从一对RNA测序样本中揭示个体失调机制:迈向精准医学
J Am Med Inform Assoc. 2014 Nov-Dec;21(6):1015-25. doi: 10.1136/amiajnl-2013-002519. Epub 2014 Jun 12.
8
Pathway analysis using random forests with bivariate node-split for survival outcomes.使用随机森林进行生存结局的双变量节点分裂的通路分析。
Bioinformatics. 2010 Jan 15;26(2):250-8. doi: 10.1093/bioinformatics/btp640. Epub 2009 Nov 18.
9
Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.在强相关结构下改进错误发现率(FDR)控制中零假设数量估计的重采样策略。
BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.
10
Empirical Bayes screening of many p-values with applications to microarray studies.用于微阵列研究的多p值经验贝叶斯筛选。
Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

引用本文的文献

1
MICRORNA-AUGMENTED PATHWAYS (mirAP) AND THEIR APPLICATIONS TO PATHWAY ANALYSIS AND DISEASE SUBTYPING.微小RNA增强通路(mirAP)及其在通路分析和疾病亚型分类中的应用。
Pac Symp Biocomput. 2017;22:390-401. doi: 10.1142/9789813207813_0037.

本文引用的文献

1
MUC1-induced alterations in a lipid metabolic gene network predict response of human breast cancers to tamoxifen treatment.MUC1诱导的脂质代谢基因网络改变可预测人类乳腺癌对他莫昔芬治疗的反应。
Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5837-41. doi: 10.1073/pnas.0812029106. Epub 2009 Mar 16.
2
Prognostic gene signatures for non-small-cell lung cancer.非小细胞肺癌的预后基因特征
Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2824-8. doi: 10.1073/pnas.0809444106. Epub 2009 Feb 5.
3
The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies.
微阵列研究中差异表达基因列表的可重复性、敏感性和特异性之间的平衡。
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S10. doi: 10.1186/1471-2105-9-S9-S10.
4
Integrative microarray analysis of pathways dysregulated in metastatic prostate cancer.转移性前列腺癌中失调通路的综合微阵列分析
Cancer Res. 2007 Nov 1;67(21):10296-303. doi: 10.1158/0008-5472.CAN-07-2173.
5
The low-density lipoprotein receptor-related protein regulates cancer cell survival and metastasis development.低密度脂蛋白受体相关蛋白调节癌细胞的存活和转移发展。
Cancer Res. 2007 Oct 15;67(20):9817-24. doi: 10.1158/0008-5472.CAN-07-0683.
6
GenMAPP 2: new features and resources for pathway analysis.基因图谱2:通路分析的新特性与资源
BMC Bioinformatics. 2007 Jun 24;8:217. doi: 10.1186/1471-2105-8-217.
7
Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.在强相关结构下改进错误发现率(FDR)控制中零假设数量估计的重采样策略。
BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.
8
The Akt/mTOR and mitogen-activated protein kinase pathways in lung cancer therapy.肺癌治疗中的Akt/mTOR和丝裂原活化蛋白激酶信号通路
J Thorac Oncol. 2006 Sep;1(7):749-51.
9
Analyzing gene expression data in terms of gene sets: methodological issues.从基因集角度分析基因表达数据:方法学问题。
Bioinformatics. 2007 Apr 15;23(8):980-7. doi: 10.1093/bioinformatics/btm051. Epub 2007 Feb 15.
10
IDconverter and IDClight: conversion and annotation of gene and protein IDs.IDconverter和IDClight:基因和蛋白质ID的转换与注释
BMC Bioinformatics. 2007 Jan 10;8:9. doi: 10.1186/1471-2105-8-9.