基于文本的带有注释偏差的微阵列基因列表过度代表性分析。

Text-based over-representation analysis of microarray gene lists with annotation bias.

作者信息

Leong Hui Sun, Kipling David

机构信息

Department of Pathology, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK.

出版信息

Nucleic Acids Res. 2009 Jun;37(11):e79. doi: 10.1093/nar/gkp310. Epub 2009 May 8.

DOI:10.1093/nar/gkp310

PMID:19429895

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2699530/

Abstract

A major challenge in microarray data analysis is the functional interpretation of gene lists. A common approach to address this is over-representation analysis (ORA), which uses the hypergeometric test (or its variants) to evaluate whether a particular functionally defined group of genes is represented more than expected by chance within a gene list. Existing applications of ORA have been largely limited to pre-defined terminologies such as GO and KEGG. We report our explorations of whether ORA can be applied to a wider mining of free-text. We found that a hitherto underappreciated feature of experimentally derived gene lists is that the constituents have substantially more annotation associated with them, as they have been researched upon for a longer period of time. This bias, a result of patterns of research activity within the biomedical community, is a major problem for classical hypergeometric test-based ORA approaches, which cannot account for such bias. We have therefore developed three approaches to overcome this bias, and demonstrate their usability in a wide range of published datasets covering different species. A comparison with existing tools that use GO terms suggests that mining PubMed abstracts can reveal additional biological insight that may not be possible by mining pre-defined ontologies alone.

摘要

微阵列数据分析中的一个主要挑战是对基因列表进行功能解释。解决这一问题的常用方法是过度表达分析（ORA），它使用超几何检验（或其变体）来评估特定功能定义的基因组在基因列表中出现的频率是否高于随机预期。ORA的现有应用在很大程度上局限于预定义的术语，如GO和KEGG。我们报告了我们对ORA是否可应用于更广泛的自由文本挖掘的探索。我们发现，实验得出的基因列表中一个迄今未被充分认识的特征是，其组成部分与更多的注释相关联，因为它们已经被研究了更长的时间。这种偏差是生物医学界研究活动模式的结果，对于基于经典超几何检验的ORA方法来说是一个主要问题，因为这些方法无法解释这种偏差。因此，我们开发了三种方法来克服这种偏差，并在涵盖不同物种的大量已发表数据集中证明了它们的可用性。与使用GO术语的现有工具进行的比较表明，挖掘PubMed摘要可以揭示仅挖掘预定义本体可能无法获得的额外生物学见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d5c/2699530/eb832f82cc7d/gkp310f1.jpg

相似文献

Text-based over-representation analysis of microarray gene lists with annotation bias.

Nucleic Acids Res. 2009 Jun;37(11):e79. doi: 10.1093/nar/gkp310. Epub 2009 May 8.

MeSH ORA framework: R/Bioconductor packages to support MeSH over-representation analysis.

BMC Bioinformatics. 2015 Feb 15;16:45. doi: 10.1186/s12859-015-0453-z.

MILANO--custom annotation of microarray results using automatic literature searches.

BMC Bioinformatics. 2005 Jan 20;6:12. doi: 10.1186/1471-2105-6-12.

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.

BMC Bioinformatics. 2009 Feb 3;10:48. doi: 10.1186/1471-2105-10-48.

MeSHer: identifying biological concepts in microarray assays based on PubMed references and MeSH terms.

Bioinformatics. 2005 Aug 1;21(15):3324-6. doi: 10.1093/bioinformatics/bti503. Epub 2005 May 26.

Ontology-driven indexing of public datasets for translational bioinformatics.

BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-10-S2-S1.

Microarray data mining using gene ontology.

Stud Health Technol Inform. 2004;107(Pt 2):778-82.

DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W169-75. doi: 10.1093/nar/gkm415. Epub 2007 Jun 18.

Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.

BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S14. doi: 10.1186/1471-2105-6-S4-S14.

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.

引用本文的文献

CBioProfiler: A Web and Standalone Pipeline for Cancer Biomarker and Subtype Characterization.

Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae045.

Data-driven analysis and druggability assessment methods to accelerate the identification of novel cancer targets.

Comput Struct Biotechnol J. 2022 Nov 24;21:46-57. doi: 10.1016/j.csbj.2022.11.042. eCollection 2023.

Ex Vivo and In Vitro Studies Revealed Underlying Mechanisms of Immature Intestinal Inflammatory Responses Caused by Aflatoxin M1 Together with Ochratoxin A.

Toxins (Basel). 2022 Feb 25;14(3):173. doi: 10.3390/toxins14030173.

Programmed necroptosis is upregulated in low-grade myelodysplastic syndromes and may play a role in the pathogenesis.

Exp Hematol. 2021 Nov;103:60-72.e5. doi: 10.1016/j.exphem.2021.09.004. Epub 2021 Sep 24.

Altered microbiota-host metabolic cross talk preceding neutropenic fever in patients with acute leukemia.

Blood Adv. 2021 Oct 26;5(20):3937-3950. doi: 10.1182/bloodadvances.2021004973.

Hippocampal CA3 transcriptional modules associated with granule cell alterations and cognitive impairment in refractory mesial temporal lobe epilepsy patients.

Sci Rep. 2021 May 13;11(1):10257. doi: 10.1038/s41598-021-89802-3.

The Interaction of Human and miRNAs with Multiple Sclerosis Risk Loci.

Int J Mol Sci. 2021 Mar 13;22(6):2927. doi: 10.3390/ijms22062927.

The interaction of Multiple Sclerosis risk loci with Epstein-Barr virus phenotypes implicates the virus in pathogenesis.

Sci Rep. 2020 Jan 13;10(1):193. doi: 10.1038/s41598-019-55850-z.

PathBank: a comprehensive pathway database for model organisms.

Nucleic Acids Res. 2020 Jan 8;48(D1):D470-D478. doi: 10.1093/nar/gkz861.

Evidence from genome wide association studies implicates reduced control of Epstein-Barr virus infection in multiple sclerosis susceptibility.

Genome Med. 2019 Apr 30;11(1):26. doi: 10.1186/s13073-019-0640-z.

本文引用的文献

Group testing for pathway analysis improves comparability of different microarray datasets.

Bioinformatics. 2006 Oct 15;22(20):2500-6. doi: 10.1093/bioinformatics/btl424. Epub 2006 Aug 7.

Differential gene induction by type I and type II interferons and their combination.

J Interferon Cytokine Res. 2006 Jul;26(7):462-72. doi: 10.1089/jir.2006.26.462.

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.

Discovering statistically significant pathways in expression profiling studies.

Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13544-9. doi: 10.1073/pnas.0506577102. Epub 2005 Sep 8.

Ontological analysis of gene expression data: current tools, limitations, and open problems.

Bioinformatics. 2005 Sep 15;21(18):3587-95. doi: 10.1093/bioinformatics/bti565. Epub 2005 Jun 30.

Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression.

Bioinformatics. 2005 Apr 1;21(7):1129-37. doi: 10.1093/bioinformatics/bti149. Epub 2004 Nov 18.

TXTGate: profiling gene groups with text-based information.

Genome Biol. 2004;5(6):R43. doi: 10.1186/gb-2004-5-6-r43. Epub 2004 May 28.

Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex.

Neurochem Res. 2004 Jun;29(6):1213-22. doi: 10.1023/b:nere.0000023608.29741.45.

FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.

Bioinformatics. 2004 Mar 1;20(4):578-80. doi: 10.1093/bioinformatics/btg455. Epub 2004 Jan 22.

Identifying biological themes within lists of genes with EASE.

Genome Biol. 2003;4(10):R70. doi: 10.1186/gb-2003-4-10-r70. Epub 2003 Sep 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

基于文本的带有注释偏差的微阵列基因列表过度代表性分析。

Text-based over-representation analysis of microarray gene lists with annotation bias.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

基于文本的带有注释偏差的微阵列基因列表过度代表性分析。

Text-based over-representation analysis of microarray gene lists with annotation bias.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献