Suppr超能文献

AUPRC:一种用于评估计算机模拟扰动方法在识别差异表达基因方面性能的指标。

AUPRC: a metric for evaluating the performance of in-silico perturbation methods in identifying differentially expressed genes.

作者信息

Zhu Hongxu, Asiaee Amir, Azinfar Leila, Li Jun, Liang Han, Irajizad Ehsan, Do Kim-Anh, Long James P

机构信息

Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston School of Public Health, 1200 Pressler St., 77030, TX, United States.

Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Avenue, 37203, TN, United States.

出版信息

Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf426.

Abstract

In silico perturbation models, computational methods that can predict cellular responses to perturbations, present an opportunity to reduce the need for costly and time-intensive in vitro experiments. Many recently proposed models predict high-dimensional cellular responses, such as gene or protein expression to perturbations such as gene knockout or drugs. However, evaluating in silico performance has largely relied on metrics such as $R^{2}$, which assess overall prediction accuracy but fail to capture biologically significant outcomes like the identification of differentially expressed (DE) genes. In this study, we present a novel evaluation framework that introduces the AUPRC metric to assess the precision and recall of DE gene predictions. By applying this framework to both single-cell and pseudo-bulked datasets, we systematically benchmark simple and advanced computational models. Our results highlight a significant discrepancy between $R^{2}$ and AUPRC, with models achieving high $R^{2}$ values but struggling to identify DE genes, as reflected in their low AUPRC values. This finding underscores the limitations of traditional evaluation metrics and the importance of biologically relevant assessments. Our framework provides a more comprehensive understanding of model capabilities, advancing the application of computational approaches in cellular perturbation research.

摘要

计算机模拟扰动模型,即能够预测细胞对扰动反应的计算方法,为减少对昂贵且耗时的体外实验的需求提供了契机。许多最近提出的模型可预测高维细胞反应,例如基因敲除或药物等扰动下的基因或蛋白质表达。然而,评估计算机模拟性能在很大程度上依赖于诸如(R^{2})之类的指标,这些指标评估的是整体预测准确性,但无法捕捉到诸如鉴定差异表达(DE)基因等具有生物学意义的结果。在本研究中,我们提出了一种新颖的评估框架,该框架引入了AUPRC指标来评估DE基因预测的精确率和召回率。通过将此框架应用于单细胞和伪批量数据集,我们系统地对简单和先进的计算模型进行了基准测试。我们的结果凸显了(R^{2})与AUPRC之间的显著差异,即模型虽能达到较高的(R^{2})值,但在识别DE基因方面却存在困难,这在其较低的AUPRC值中得到体现。这一发现强调了传统评估指标的局限性以及生物学相关评估的重要性。我们的框架提供了对模型能力更全面的理解,推动了计算方法在细胞扰动研究中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94aa/12400816/48aac829f9ad/bbaf426f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验