Suppr超能文献

无模型预测检验及其在基因组学数据中的应用。

Model-free prediction test with application to genomics data.

机构信息

Department of Statistics, Iowa State University, Ames, IA 50011.

Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213.

出版信息

Proc Natl Acad Sci U S A. 2022 Aug 23;119(34):e2205518119. doi: 10.1073/pnas.2205518119. Epub 2022 Aug 15.

Abstract

Testing the significance of predictors in a regression model is one of the most important topics in statistics. This problem is especially difficult without any parametric assumptions on the data. This paper aims to test the null hypothesis that given confounding variables , does not significantly contribute to the prediction of under the model-free setting, where and are possibly high dimensional. We propose a general framework that first fits nonparametric machine learning regression algorithms on [Formula: see text] and [Formula: see text], then compares the prediction power of the two models. The proposed method allows us to leverage the strength of the most powerful regression algorithms developed in the modern machine learning community. The value for the test can be easily obtained by permutation. In simulations, we find that the proposed method is more powerful compared to existing methods. The proposed method allows us to draw biologically meaningful conclusions from two gene expression data analyses without strong distributional assumptions: 1) testing the prediction power of sequencing RNA for the proteins in cellular indexing of transcriptomes and epitopes by sequencing data and 2) identification of spatially variable genes in spatially resolved transcriptomics data.

摘要

检验回归模型中预测因子的显著性是统计学中最重要的课题之一。在对数据没有任何参数假设的情况下,这个问题尤其困难。本文旨在检验零假设,即在给定混杂变量的情况下,在无模型设定下, 对 的预测没有显著贡献,其中 和 可能是高维的。我们提出了一个通用框架,首先在 [Formula: see text] 和 [Formula: see text] 上拟合非参数机器学习回归算法,然后比较两个模型的预测能力。所提出的方法允许我们利用现代机器学习社区中开发的最强大的回归算法的优势。通过置换可以轻松获得检验的 值。在模拟中,我们发现与现有方法相比,所提出的方法更有效。所提出的方法允许我们从两个基因表达数据分析中得出具有生物学意义的结论,而无需进行强分布假设:1)测试 RNA 测序对细胞转录组和表位测序数据中蛋白质的预测能力,以及 2)鉴定空间分辨转录组学数据中的空间变异基因。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验