• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在单细胞病例对照研究中,带有适当偏移量的伪总体具有与广义线性混合模型相同的统计性质。

Pseudobulk with proper offsets has the same statistical properties as generalized linear mixed models in single-cell case-control studies.

机构信息

Department of Medicine, Seoul National University College of Medicine, Seoul, 03080, Republic of Korea.

Department of Statistics, University of Michigan, Ann Arbor, 48109, United States.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae498.

DOI:10.1093/bioinformatics/btae498
PMID:39115884
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11343365/
Abstract

MOTIVATION

Generalized linear mixed models (GLMMs), such as the negative-binomial or Poisson linear mixed model, are widely applied to single-cell RNA sequencing data to compare transcript expression between different conditions determined at the subject level. However, the model is computationally intensive, and its relative statistical performance to pseudobulk approaches is poorly understood.

RESULTS

We propose offset-pseudobulk as a lightweight alternative to GLMMs. We prove that a count-based pseudobulk equipped with a proper offset variable has the same statistical properties as GLMMs in terms of both point estimates and standard errors. We confirm our findings using simulations based on real data. Offset-pseudobulk is substantially faster (>×10) and numerically more stable than GLMMs.

AVAILABILITY AND IMPLEMENTATION

Offset pseudobulk can be easily implemented in any generalized linear model software by tweaking a few options. The codes can be found at https://github.com/hanbin973/pseudobulk_is_mm.

摘要

动机

广义线性混合模型(GLMMs),如负二项式或泊松线性混合模型,广泛应用于单细胞 RNA 测序数据,以比较在主体水平上确定的不同条件下的转录物表达。然而,该模型计算密集,并且其相对于伪总体方法的相对统计性能尚不清楚。

结果

我们提出偏移伪总体作为 GLMMs 的轻量级替代方法。我们证明,基于计数的伪总体配备适当的偏移变量,在点估计和标准误差方面与 GLMMs 具有相同的统计特性。我们使用基于真实数据的模拟来证实我们的发现。偏移伪总体比 GLMMs 快得多(>×10),数值上也更稳定。

可用性和实现

通过调整几个选项,偏移伪总体可以轻松地在任何广义线性模型软件中实现。代码可以在 https://github.com/hanbin973/pseudobulk_is_mm 上找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3004/11343365/c94ac7ab209a/btae498f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3004/11343365/09be3c614819/btae498f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3004/11343365/c94ac7ab209a/btae498f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3004/11343365/09be3c614819/btae498f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3004/11343365/c94ac7ab209a/btae498f2.jpg

相似文献

1
Pseudobulk with proper offsets has the same statistical properties as generalized linear mixed models in single-cell case-control studies.在单细胞病例对照研究中,带有适当偏移量的伪总体具有与广义线性混合模型相同的统计性质。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae498.
2
A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis.一项平衡测量显示了伪批量方法在单细胞RNA测序分析中的卓越性能。
Nat Commun. 2022 Dec 22;13(1):7851. doi: 10.1038/s41467-022-35519-4.
3
Reply to: A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis.回复:一种平衡度量显示了伪批量方法在单细胞RNA测序分析中的卓越性能。
Nat Commun. 2022 Dec 22;13(1):7852. doi: 10.1038/s41467-022-35520-x.
4
A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection.基于近似伪总体投影的单细胞数据主题建模可扩展方法。
Life Sci Alliance. 2024 Aug 6;7(10). doi: 10.26508/lsa.202402713. Print 2024 Oct.
5
Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies.基于广义线性混合模型的基因组测序研究中计数数据的遗传力估计和差异分析。
Bioinformatics. 2019 Feb 1;35(3):487-496. doi: 10.1093/bioinformatics/bty644.
6
Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.高效惩罚广义线性混合模型在高维数据中的变量选择和遗传风险预测。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.
7
Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca.利用 fastglmpca 加速单细胞 RNA 测序数据的降维。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae494.
8
Meta-analysis of binary outcomes via generalized linear mixed models: a simulation study.基于广义线性混合模型的二分类结局的Meta 分析:一项模拟研究。
BMC Med Res Methodol. 2018 Jul 4;18(1):70. doi: 10.1186/s12874-018-0531-9.
9
Quantifying circular RNA expression from RNA-seq data using model-based framework.基于模型框架从 RNA-seq 数据中定量环状 RNA 的表达。
Bioinformatics. 2017 Jul 15;33(14):2131-2139. doi: 10.1093/bioinformatics/btx129.
10
High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets.高性能实现广义线性混合模型的层次似然:在大规模电子健康记录数据集估计钾参考范围中的应用。
BMC Med Res Methodol. 2021 Jul 24;21(1):151. doi: 10.1186/s12874-021-01318-6.

本文引用的文献

1
Selective Inference for Hierarchical Clustering.层次聚类的选择性推断
J Am Stat Assoc. 2024;119(545):332-342. doi: 10.1080/01621459.2022.2116331. Epub 2022 Oct 11.
2
Identification of genetic variants that impact gene co-expression relationships using large-scale single-cell data.利用大规模单细胞数据鉴定影响基因共表达关系的遗传变异。
Genome Biol. 2023 Apr 18;24(1):80. doi: 10.1186/s13059-023-02897-x.
3
A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis.一项平衡测量显示了伪批量方法在单细胞RNA测序分析中的卓越性能。
Nat Commun. 2022 Dec 22;13(1):7851. doi: 10.1038/s41467-022-35519-4.
4
Inference after latent variable estimation for single-cell RNA sequencing data.单细胞 RNA 测序数据中潜在变量估计后的推断。
Biostatistics. 2023 Dec 15;25(1):270-287. doi: 10.1093/biostatistics/kxac047.
5
Benchmarking methods for detecting differential states between conditions from multi-subject single-cell RNA-seq data.用于检测多主体单细胞 RNA-seq 数据中条件间差异状态的基准方法。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac286.
6
Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease.单细胞 eQTL 图谱分析确定自身免疫性疾病的细胞类型特异性遗传调控。
Science. 2022 Apr 8;376(6589):eabf3041. doi: 10.1126/science.abf3041.
7
Confronting false discoveries in single-cell differential expression.单细胞差异表达中虚假发现的应对策略。
Nat Commun. 2021 Sep 28;12(1):5692. doi: 10.1038/s41467-021-25960-2.
8
NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data.NEBULA 是一个快速的负二项式混合模型,用于大规模多主体单细胞数据的差异或共表达分析。
Commun Biol. 2021 May 26;4(1):629. doi: 10.1038/s42003-021-02146-6.
9
Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease.从结核队列中多模态分析记忆 T 细胞,确定与人口统计学、环境和疾病相关的细胞状态关联。
Nat Immunol. 2021 Jun;22(6):781-793. doi: 10.1038/s41590-021-00933-1. Epub 2021 May 24.
10
A practical solution to pseudoreplication bias in single-cell studies.单细胞研究中拟似重复偏倚的实用解决方案。
Nat Commun. 2021 Feb 2;12(1):738. doi: 10.1038/s41467-021-21038-1.