Suppr超能文献

比较已编辑基因集与基于转录组学的基因特征在检测免疫细胞中通路激活的应用。

A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells.

机构信息

Hannover Medical School, Biomedical Research in Endstage and Obstructive Lung Disease Hannover (BREATH), German Center for Lung Research, Carl-Neuberg-Straße, Hannover, 30625, Germany.

Institute of Technical Chemistry, Leibniz University of Hannover, Callinstraße 5, Hannover, 30167, Germany.

出版信息

BMC Bioinformatics. 2020 Jan 28;21(1):28. doi: 10.1186/s12859-020-3366-4.

Abstract

BACKGROUND

Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, "gene signatures" for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research.

RESULTS

We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity.

CONCLUSIONS

When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives.

摘要

背景

尽管转录组学在生物和生物医学研究领域做出了重要贡献,但解释大量显著差异表达基因仍然是分析过程中的一个挑战。基因集富集分析是一种将差异表达基因总结为途径或其他基因分组的标准方法。在这里,我们探索了一种利用来自已验证数据库的基因集的替代方法。我们研究了一种从先前转录组学研究的参考数据集推导出与特定实验相关的自定义基因集的方法。我们将这些从数据中推导出的基因集称为先前研究中测试的生物学过程的“基因特征”。我们专注于这种方法在分析免疫相关过程中的可行性,这些过程在性质上很复杂,但在医学研究中起着重要作用。

结果

我们评估了几种用于在目标数据集检测基因特征活性的统计方法。我们比较了数据派生基因特征方法与所有统计检验中可比的 GO 术语基因集的性能。从 26 个转录组实验中总共生成了 61 个差异表达比较,这些实验涵盖了八种白细胞中的八种免疫过程。使用数据派生的特征以适度的准确性(AUC = 0.67)来检测测试数据中免疫过程的存在。GO 和基于文献的基因集的性能更差(AUC = 0.59)。这两种方法都存在特异性差的问题。

结论

当研究人员寻求测试特定假设时,数据派生的特征方法可以与基于标准基因集的方法一样,甚至更好地用于免疫特征。此外,在缺乏途径数据库中定义明确的基因集的情况下,可以生成数据派生的特征,并且还提供了以细胞类型特异性方式定义特征的机会。然而,无论是数据派生的特征还是标准基因集都不能可靠地为阴性病例提供可靠的阴性预测。我们得出的结论是,数据派生的特征方法是一种有用的且有时是必要的工具,但分析人员应该警惕假阳性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8770/6986093/eed4f3eb83da/12859_2020_3366_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验