• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ProbCD:考虑分类不确定性的富集分析。

ProbCD: enrichment analysis accounting for categorization uncertainty.

作者信息

Vêncio Ricardo Z N, Shmulevich Ilya

机构信息

Institute for Systems Biology, 1441 North 34th street, Seattle, WA 98103-8904, USA.

出版信息

BMC Bioinformatics. 2007 Oct 12;8:383. doi: 10.1186/1471-2105-8-383.

DOI:10.1186/1471-2105-8-383
PMID:17935624
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2169266/
Abstract

BACKGROUND

As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test.

RESULTS

We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/.

CONCLUSION

We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.

摘要

背景

与许多其他科学领域一样,系统生物学在列联表中广泛使用统计关联和显著性估计,在该领域这种类型的分类数据分析被称为富集(也称为过度表达或增强)分析。尽管人们努力创建概率注释,特别是在基因本体论的背景下,或者处理基于高通量数据集的不确定性,但当前的富集方法很大程度上忽略了这种概率信息,因为它们主要基于Fisher精确检验的变体。

结果

我们开发了一个基于R的开源软件ProbCD来处理概率分类数据分析,该软件不需要静态列联表。富集问题的列联表是根据给定分类概率的伯努利方案随机过程的期望构建的。创建了一个在线界面,供非程序员使用,可在以下网址获取:http://xerad.systemsbiology.net/ProbCD/。

结论

我们提出了一个分析框架和软件工具来解决分类数据分析中的不确定性问题。特别是,关于富集分析,ProbCD可以适应:(i)高通量实验技术的随机性和(ii)概率基因注释。

相似文献

1
ProbCD: enrichment analysis accounting for categorization uncertainty.ProbCD:考虑分类不确定性的富集分析。
BMC Bioinformatics. 2007 Oct 12;8:383. doi: 10.1186/1471-2105-8-383.
2
Genome Expression Pathway Analysis Tool--analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context.基因组表达通路分析工具——在基因组、蛋白质组和代谢背景下对微阵列基因表达数据进行分析和可视化。
BMC Bioinformatics. 2007 Jun 2;8:179. doi: 10.1186/1471-2105-8-179.
3
FERN - a Java framework for stochastic simulation and evaluation of reaction networks.FERN——一个用于反应网络随机模拟和评估的Java框架。
BMC Bioinformatics. 2008 Aug 29;9:356. doi: 10.1186/1471-2105-9-356.
4
Improving missing value estimation in microarray data with gene ontology.利用基因本体论改进微阵列数据中的缺失值估计
Bioinformatics. 2006 Mar 1;22(5):566-72. doi: 10.1093/bioinformatics/btk019. Epub 2005 Dec 23.
5
Gene-Expression Omnibus integration and clustering tools in SeqExpress.SeqExpress中的基因表达综合数据库整合与聚类工具。
Bioinformatics. 2005 May 15;21(10):2550-1. doi: 10.1093/bioinformatics/bti355. Epub 2005 Mar 3.
6
GAGE: generally applicable gene set enrichment for pathway analysis.GAGE:用于通路分析的通用基因集富集分析
BMC Bioinformatics. 2009 May 27;10:161. doi: 10.1186/1471-2105-10-161.
7
GeneTools--application for functional annotation and statistical hypothesis testing.基因工具——用于功能注释和统计假设检验的应用程序。
BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470.
8
Pomelo II: finding differentially expressed genes.柚子二号:寻找差异表达基因。
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W581-6. doi: 10.1093/nar/gkp366. Epub 2009 May 12.
9
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
10
ENViz: a Cytoscape App for integrated statistical analysis and visualization of sample-matched data with multiple data types.ENViz:一款用于对具有多种数据类型的样本匹配数据进行综合统计分析和可视化的Cytoscape应用程序。
Bioinformatics. 2015 May 15;31(10):1683-5. doi: 10.1093/bioinformatics/btu853. Epub 2015 Jan 9.

引用本文的文献

1
A modular transcriptome map of mature B cell lymphomas.成熟 B 细胞淋巴瘤的模块化转录组图谱。
Genome Med. 2019 Apr 30;11(1):27. doi: 10.1186/s13073-019-0637-7.
2
timeClip: pathway analysis for time course data without replicates.timeClip:用于无重复时间序列数据的通路分析。
BMC Bioinformatics. 2014;15 Suppl 5(Suppl 5):S3. doi: 10.1186/1471-2105-15-S5-S3. Epub 2014 May 6.
3
Portraying the Expression Landscapes of B-CellLymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes.描绘 B 细胞淋巴瘤的表达景观——异常样本和分子亚型的直观检测。

本文引用的文献

1
Estimating the annotation error rate of curated GO database sequence annotations.估计经过整理的基因本体论(GO)数据库序列注释的注释错误率。
BMC Bioinformatics. 2007 May 22;8:170. doi: 10.1186/1471-2105-8-170.
2
Analyzing gene expression data in terms of gene sets: methodological issues.从基因集角度分析基因表达数据:方法学问题。
Bioinformatics. 2007 Apr 15;23(8):980-7. doi: 10.1093/bioinformatics/btm051. Epub 2007 Feb 15.
3
Enrichment or depletion of a GO category within a class of genes: which test?一类基因中GO类别(基因本体论类别)的富集或耗竭:采用哪种检验方法?
Biology (Basel). 2013 Dec 2;2(4):1411-37. doi: 10.3390/biology2041411.
4
Optimization of gene set annotations via entropy minimization over variable clusters (EMVC).通过对可变聚类进行熵最小化(EMVC)优化基因集注释。
Bioinformatics. 2014 Jun 15;30(12):1698-706. doi: 10.1093/bioinformatics/btu110. Epub 2014 Feb 25.
5
Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis.研究基因本体论术语的一致性揭示了富集分析的平台内和平台间可重复性。
BMC Bioinformatics. 2013 Apr 29;14:143. doi: 10.1186/1471-2105-14-143.
6
Mining SOM expression portraits: feature selection and integrating concepts of molecular function.挖掘 SOM 表达特征图谱:特征选择和分子功能概念的整合。
BioData Min. 2012 Oct 8;5(1):18. doi: 10.1186/1756-0381-5-18.
7
Markov Chain Ontology Analysis (MCOA).马尔可夫链本体分析(MCOA)。
BMC Bioinformatics. 2012 Feb 3;13:23. doi: 10.1186/1471-2105-13-23.
8
Expression cartography of human tissues using self organizing maps.使用自组织映射进行人类组织表达图谱绘制。
BMC Bioinformatics. 2011 Jul 27;12:306. doi: 10.1186/1471-2105-12-306.
9
Generalized random set framework for functional enrichment analysis using primary genomics datasets.基于初级基因组学数据集的功能富集分析的广义随机集框架。
Bioinformatics. 2011 Jan 1;27(1):70-7. doi: 10.1093/bioinformatics/btq593. Epub 2010 Oct 22.
10
ProbFAST: Probabilistic functional analysis system tool.ProbFAST:概率功能分析系统工具。
BMC Bioinformatics. 2010 Mar 30;11:161. doi: 10.1186/1471-2105-11-161.
Bioinformatics. 2007 Feb 15;23(4):401-7. doi: 10.1093/bioinformatics/btl633. Epub 2006 Dec 20.
4
Extensions to gene set enrichment.基因集富集的扩展
Bioinformatics. 2007 Feb 1;23(3):306-13. doi: 10.1093/bioinformatics/btl599. Epub 2006 Nov 24.
5
Functional interpretation of microarray experiments.微阵列实验的功能解读
OMICS. 2006 Fall;10(3):398-410. doi: 10.1089/omi.2006.10.398.
6
GOLEM: an interactive graph-based gene-ontology navigation and analysis tool.GOLEM:一种基于图形的交互式基因本体导航与分析工具。
BMC Bioinformatics. 2006 Oct 10;7:443. doi: 10.1186/1471-2105-7-443.
7
Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.对基因本体术语进行分组以改进对微阵列数据中基因集富集的评估。
BMC Bioinformatics. 2006 Oct 3;7:426. doi: 10.1186/1471-2105-7-426.
8
Protein classification using probabilistic chain graphs and the Gene Ontology structure.使用概率链图和基因本体结构进行蛋白质分类。
Bioinformatics. 2006 Aug 1;22(15):1871-8. doi: 10.1093/bioinformatics/btl187. Epub 2006 May 16.
9
Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets.整合证据、生物医学文献与统计相关性:基因集功能注释的新见解
BMC Bioinformatics. 2006 May 4;7:241. doi: 10.1186/1471-2105-7-241.
10
The Gaggle: an open-source software system for integrating bioinformatics software and data sources.Gaggle:一个用于整合生物信息学软件和数据源的开源软件系统。
BMC Bioinformatics. 2006 Mar 28;7:176. doi: 10.1186/1471-2105-7-176.