研究基因本体论术语的一致性揭示了富集分析的平台内和平台间可重复性。

Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis.

机构信息

College of Chemistry, Sichuan University, Chengdu, 610064, People's Republic of China.

出版信息

BMC Bioinformatics. 2013 Apr 29;14:143. doi: 10.1186/1471-2105-14-143.

DOI:10.1186/1471-2105-14-143

PMID:23627640

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3644270/

Abstract

BACKGROUND

Reliability and Reproducibility of differentially expressed genes (DEGs) are essential for the biological interpretation of microarray data. The microarray quality control (MAQC) project launched by US Food and Drug Administration (FDA) elucidated that the lists of DEGs generated by intra- and inter-platform comparisons can reach a high level of concordance, which mainly depended on the statistical criteria used for ranking and selecting DEGs. Generally, it will produce reproducible lists of DEGs when combining fold change ranking with a non-stringent p-value cutoff. For further interpretation of the gene expression data, statistical methods of gene enrichment analysis provide powerful tools for associating the DEGs with prior biological knowledge, e.g. Gene Ontology (GO) terms and pathways, and are widely used in genome-wide research. Although the DEG lists generated from the same compared conditions proved to be reliable, the reproducible enrichment results are still crucial to the discovery of the underlying molecular mechanism differentiating the two conditions. Therefore, it is important to know whether the enrichment results are still reproducible, when using the lists of DEGs generated by different statistic criteria from inter-laboratory and cross-platform comparisons. In our study, we used the MAQC data sets for systematically accessing the intra- and inter-platform concordance of GO terms enriched by Gene Set Enrichment Analysis (GSEA) and LRpath.

RESULTS

In intra-platform comparisons, the overlapped percentage of enriched GO terms was as high as ~80% when the inputted lists of DEGs were generated by fold change ranking and Significance Analysis of Microarrays (SAM), whereas the percentages decreased about 20% when generating the lists of DEGs by using fold change ranking and t-test, or by using SAM and t-test. Similar results were found in inter-platform comparisons.

CONCLUSIONS

Our results demonstrated that the lists of DEGs in a high level of concordance can ensure the high concordance of enrichment results. Importantly, based on the lists of DEGs generated by a straightforward method of combining fold change ranking with a non-stringent p-value cutoff, enrichment analysis will produce reproducible enriched GO terms for the biological interpretation.

摘要

背景

差异表达基因（DEG）的可靠性和可重复性对于微阵列数据的生物学解释至关重要。美国食品和药物管理局（FDA）开展的微阵列质量控制（MAQC）项目阐明，通过平台内和平台间比较生成的 DEG 列表可以达到高度一致，这主要取决于用于对 DEG 进行排名和选择的统计标准。通常，当将 fold change 排名与非严格的 p 值截止值结合使用时，将生成可重复的 DEG 列表。为了进一步解释基因表达数据，基因富集分析的统计方法为将 DEG 与先前的生物学知识（例如基因本体论（GO）术语和途径）相关联提供了强大的工具，并广泛用于全基因组研究。尽管来自相同比较条件的 DEG 列表被证明是可靠的，但可重复的富集结果对于发现区分两种条件的潜在分子机制仍然至关重要。因此，当使用来自不同实验室和跨平台比较的不同统计标准生成的 DEG 列表时，了解富集结果是否仍然可重复非常重要。在我们的研究中，我们使用 MAQC 数据集系统地评估了基因集富集分析（GSEA）和 LRpath 富集的 GO 术语的平台内和平台间一致性。

结果

在平台内比较中，当通过 fold change 排名和显著分析微阵列（SAM）生成 DEG 列表时，富集的 GO 术语的重叠百分比高达约 80％，而当通过使用 fold change 排名和 t 检验生成 DEG 列表时，或使用 SAM 和 t 检验时，该百分比降低了约 20％。在平台间比较中也发现了类似的结果。

结论

我们的结果表明，高度一致的 DEG 列表可以确保富集结果的高度一致性。重要的是，基于通过简单方法结合 fold change 排名和非严格的 p 值截止值生成的 DEG 列表，富集分析将为生物学解释产生可重复的富集 GO 术语。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9933/3644270/75c136f63dc2/1471-2105-14-143-1.jpg

相似文献

Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis.研究基因本体论术语的一致性揭示了富集分析的平台内和平台间可重复性。

BMC Bioinformatics. 2013 Apr 29;14:143. doi: 10.1186/1471-2105-14-143.

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies.微阵列研究中差异表达基因列表的可重复性、敏感性和特异性之间的平衡。

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S10. doi: 10.1186/1471-2105-9-S9-S10.

Rat toxicogenomic study reveals analytical consistency across microarray platforms.大鼠毒理基因组学研究揭示了不同微阵列平台间的分析一致性。

Nat Biotechnol. 2006 Sep;24(9):1162-9. doi: 10.1038/nbt1238.

Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data.微阵列数据的可重复性：对微阵列质量控制（MAQC）数据的进一步分析。

BMC Bioinformatics. 2007 Oct 25;8:412. doi: 10.1186/1471-2105-8-412.

Performance comparison of two microarray platforms to assess differential gene expression in human monocyte and macrophage cells.两种微阵列平台用于评估人类单核细胞和巨噬细胞中基因表达差异的性能比较。

BMC Genomics. 2008 Jun 25;9:302. doi: 10.1186/1471-2164-9-302.

Cross-platform comparison of SYBR Green real-time PCR with TaqMan PCR, microarrays and other gene expression measurement technologies evaluated in the MicroArray Quality Control (MAQC) study.在微阵列质量控制（MAQC）研究中对SYBR Green实时荧光定量PCR与TaqMan PCR、微阵列及其他基因表达测量技术进行的跨平台比较。

BMC Genomics. 2008 Jul 11;9:328. doi: 10.1186/1471-2164-9-328.

Evaluation of gene expression data generated from expired Affymetrix GeneChip® microarrays using MAQC reference RNA samples.利用 MAQC 参考 RNA 样本评估过期的 Affymetrix GeneChip® 微阵列生成的基因表达数据。

BMC Bioinformatics. 2010 Oct 7;11 Suppl 6(Suppl 6):S10. doi: 10.1186/1471-2105-11-S6-S10.

Improving the prediction of chemotherapeutic sensitivity of tumors in breast cancer via optimizing the selection of candidate genes.通过优化候选基因的选择来改善乳腺癌肿瘤化疗敏感性的预测。

Comput Biol Chem. 2014 Apr;49:71-8. doi: 10.1016/j.compbiolchem.2013.12.002. Epub 2014 Jan 1.

Evaluating methods for ranking differentially expressed genes applied to microArray quality control data.评估应用于微阵列质量控制数据的差异表达基因排序方法。

BMC Bioinformatics. 2011 Jun 6;12:227. doi: 10.1186/1471-2105-12-227.

Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays.基因集富集元学习分析：下一代测序与微阵列。

BMC Bioinformatics. 2010 Apr 8;11:176. doi: 10.1186/1471-2105-11-176.

引用本文的文献

CT Image-Based Biopsy to Aid Prediction of HOPX Expression Status and Prognosis for Non-Small Cell Lung Cancer Patients.基于CT图像的活检辅助预测非小细胞肺癌患者HOPX表达状态及预后

Cancers (Basel). 2023 Apr 10;15(8):2220. doi: 10.3390/cancers15082220.

Transcriptome analysis in LRRK2 and idiopathic Parkinson's disease at different glucose levels.不同葡萄糖水平下LRRK2与特发性帕金森病的转录组分析

NPJ Parkinsons Dis. 2021 Dec 1;7(1):109. doi: 10.1038/s41531-021-00255-x.

Enhancing reproducibility of gene expression analysis with known protein functional relationships: The concept of well-associated protein.利用已知蛋白质功能关系提高基因表达分析的可重复性：良好关联蛋白的概念。

PLoS Comput Biol. 2020 Feb 14;16(2):e1007684. doi: 10.1371/journal.pcbi.1007684. eCollection 2020 Feb.

Atlas of RNA sequencing profiles for normal human tissues.人类正常组织 RNA 测序图谱图谱集。

Sci Data. 2019 Apr 23;6(1):36. doi: 10.1038/s41597-019-0043-4.

Identification of key regulatory genes connected to NF-κB family of proteins in visceral adipose tissues using gene expression and weighted protein interaction network.利用基因表达和加权蛋白质相互作用网络鉴定内脏脂肪组织中与 NF-κB 家族蛋白相关的关键调节基因。

PLoS One. 2019 Apr 23;14(4):e0214337. doi: 10.1371/journal.pone.0214337. eCollection 2019.

Bioinformatics Analysis Reveals the Altered Gene Expression of Patients with Postmenopausal Osteoporosis Using Liuweidihuang Pills Treatment.基于生物信息学分析揭示了六味地黄丸治疗绝经后骨质疏松症患者的基因表达改变。

Biomed Res Int. 2019 Jan 27;2019:1907906. doi: 10.1155/2019/1907906. eCollection 2019.

Shambhala: a platform-agnostic data harmonizer for gene expression data.香巴拉：一个用于基因表达数据的数据协调器，与平台无关。

BMC Bioinformatics. 2019 Feb 6;20(1):66. doi: 10.1186/s12859-019-2641-8.

Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data.在分子途径层面进行数据聚合可提高实验转录组学和蛋白质组学数据的稳定性。

Cell Cycle. 2017 Oct 2;16(19):1810-1823. doi: 10.1080/15384101.2017.1361068. Epub 2017 Aug 21.

Concordance analysis of microarray studies identifies representative gene expression changes in Parkinson's disease: a comparison of 33 human and animal studies.微阵列研究的一致性分析确定帕金森病中有代表性的基因表达变化：33项人类和动物研究的比较

BMC Neurol. 2017 Mar 23;17(1):58. doi: 10.1186/s12883-017-0838-x.

How consistent are we? Interlaboratory comparison study in fathead minnows using the model estrogen 17α-ethinylestradiol to develop recommendations for environmental transcriptomics.我们的一致性如何？使用模型雌激素17α-乙炔雌二醇对黑头呆鱼进行实验室间比较研究，以制定环境转录组学的建议。

Environ Toxicol Chem. 2017 Oct;36(10):2614-2623. doi: 10.1002/etc.3799. Epub 2017 Apr 19.

本文引用的文献

FDR-FET: an optimizing gene set enrichment analysis method.FDR-FET：一种优化的基因集富集分析方法。

Adv Appl Bioinform Chem. 2011;4:37-42. doi: 10.2147/AABC.S15840. Epub 2011 Mar 15.

Gene set enrichment analysis provides insight into novel signalling pathways in breast cancer stem cells.基因集富集分析为乳腺癌干细胞中的新信号通路提供了深入了解。

Br J Cancer. 2010 Jan 5;102(1):206-12. doi: 10.1038/sj.bjc.6605468. Epub 2009 Dec 8.

PLoS Comput Biol. 2009 Jul;5(7):e1000443. doi: 10.1371/journal.pcbi.1000443. Epub 2009 Jul 31.

LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data.LRpath：一种用于识别基因表达数据中富集生物组的逻辑回归方法。

Bioinformatics. 2009 Jan 15;25(2):211-7. doi: 10.1093/bioinformatics/btn592. Epub 2008 Nov 27.

Gene set enrichment analysis using linear models and diagnostics.使用线性模型和诊断方法的基因集富集分析。

Bioinformatics. 2008 Nov 15;24(22):2586-91. doi: 10.1093/bioinformatics/btn465. Epub 2008 Sep 11.

GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.GOEAST：一个用于基因本体富集分析的基于网络的软件工具包。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W358-63. doi: 10.1093/nar/gkn276. Epub 2008 May 16.

ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data.ProfCom：一种用于剖析从高通量数据中识别出的基因群组复杂功能的网络工具。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W347-51. doi: 10.1093/nar/gkn239. Epub 2008 May 6.

GlobalANCOVA: exploration and assessment of gene group effects.全局协方差分析：基因组效应的探索与评估

Bioinformatics. 2008 Jan 1;24(1):78-85. doi: 10.1093/bioinformatics/btm531. Epub 2007 Nov 17.

Identification of prostate cancer modifier pathways using parental strain expression mapping.利用亲本品系表达图谱鉴定前列腺癌修饰通路。

Proc Natl Acad Sci U S A. 2007 Nov 6;104(45):17771-6. doi: 10.1073/pnas.0708476104. Epub 2007 Oct 31.

ProbCD: enrichment analysis accounting for categorization uncertainty.ProbCD：考虑分类不确定性的富集分析。

BMC Bioinformatics. 2007 Oct 12;8:383. doi: 10.1186/1471-2105-8-383.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

研究基因本体论术语的一致性揭示了富集分析的平台内和平台间可重复性。

Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献