银：锻造近乎黄金标准数据集。

Silver: Forging almost Gold Standard Datasets.

机构信息

Augmented Intelligence & Precision Health Laboratory, Institute of the McGill University Health Centre, McGill University, Montreal, QC H4A 3S5, Canada.

Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada.

出版信息

Genes (Basel). 2021 Sep 28;12(10):1523. doi: 10.3390/genes12101523.

DOI:10.3390/genes12101523

PMID:34680918

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8535810/

Abstract

Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known . In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene-gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.

摘要

基因集分析已被广泛应用于从高通量表达研究中获得深入了解。尽管已经开发了各种工具和方法用于基因集分析，但研究人员在最佳实践方面没有达成共识。大多数情况下，评估研究报告了哪种方法更优越的相互矛盾的建议。因此，一个公正的定量框架用于评估基因集分析方法将是有价值的。这样的框架需要基因表达数据集，其中基因集的富集状态是已知的。在没有这样的黄金标准数据集的情况下，通常使用人工数据集来评估基因集分析方法;然而，它们往往依赖于过于简化的假设，这些假设使它们偏向于或反对给定的方法。在本文中，我们提出了一个定量框架，通过使用真实数据合成表达数据集来评估基因集分析方法，而不依赖于过于简化或不现实的假设，同时保留复杂的基因-基因相关性，并保留表达值的分布。通过评估十种广泛使用的基因集分析方法，展示了定量方法的实用性。所提出方法的实现是公开可用的。我们建议使用 Silver 来评估现有的和新的基因集分析方法。使用 Silver 进行评估可以更好地了解当前的方法，并有助于开发基因集分析方法，在不牺牲敏感性的情况下实现更高的特异性。