Suppr超能文献

双西格玛:一种新型基于双组份单细胞模型的单细胞 RNA-seq 数据关联方法。

TWO-SIGMA: A novel two-component single cell model-based association method for single-cell RNA-seq data.

机构信息

Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, Ohio, USA.

出版信息

Genet Epidemiol. 2021 Mar;45(2):142-153. doi: 10.1002/gepi.22361. Epub 2020 Sep 29.

Abstract

In this paper, we develop TWO-SIGMA, a TWO-component SInGle cell Model-based Association method for differential expression (DE) analyses in single-cell RNA-seq (scRNA-seq) data. The first component models the probability of "drop-out" with a mixed-effects logistic regression model and the second component models the (conditional) mean expression with a mixed-effects negative binomial regression model. TWO-SIGMA is extremely flexible in that it: (i) does not require a log-transformation of the outcome, (ii) allows for overdispersed and zero-inflated counts, (iii) accommodates a correlation structure between cells from the same individual via random effect terms, (iv) can analyze unbalanced designs (in which the number of cells does not need to be identical for all samples), (v) can control for additional sample-level and cell-level covariates including batch effects, (vi) provides interpretable effect size estimates, and (vii) enables general tests of DE beyond two-group comparisons. To our knowledge, TWO-SIGMA is the only method for analyzing scRNA-seq data that can simultaneously accomplish each of these features. Simulations studies show that TWO-SIGMA outperforms alternative regression-based approaches in both type-I error control and power enhancement when the data contains even moderate within-sample correlation. A real data analysis using pancreas islet single-cells exhibits the flexibility of TWO-SIGMA and demonstrates that incorrectly failing to include random effect terms can have dramatic impacts on scientific conclusions. TWO-SIGMA is implemented in the R package twosigma available at https://github.com/edvanburen/twosigma.

摘要

在本文中,我们开发了 TWO-SIGMA,这是一种基于双组件单细胞模型的关联方法,用于单细胞 RNA-seq(scRNA-seq)数据中的差异表达(DE)分析。第一部分组件使用混合效应逻辑回归模型来建模“缺失”的概率,第二部分组件使用混合效应负二项回归模型来建模(条件)均值表达。TWO-SIGMA 非常灵活,因为它:(i) 不需要对结果进行对数转换,(ii) 允许过度分散和零膨胀计数,(iii) 通过随机效应项来适应来自同一个体的细胞之间的相关性结构,(iv) 可以分析不平衡设计(其中所有样本的细胞数量不需要相同),(v) 可以控制额外的样本水平和细胞水平协变量,包括批次效应,(vi) 提供可解释的效应大小估计,以及 (vii) 能够进行超越两组比较的 DE 的一般检验。据我们所知,TWO-SIGMA 是唯一一种能够同时实现所有这些特征的分析 scRNA-seq 数据的方法。模拟研究表明,当数据包含中等程度的样本内相关性时,TWO-SIGMA 在控制第一类错误和增强功效方面优于替代的基于回归的方法。使用胰腺胰岛单细胞的真实数据分析展示了 TWO-SIGMA 的灵活性,并表明不正确地忽略随机效应项会对科学结论产生巨大影响。TWO-SIGMA 已在可从 https://github.com/edvanburen/twosigma 获得的 R 包 twosigma 中实现。

相似文献

8
Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。
Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.

引用本文的文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验