• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用泊松混合模型对RNA测序数据进行差异表达分析。

Differential expression analysis for RNAseq using Poisson mixed models.

作者信息

Sun Shiquan, Hood Michelle, Scott Laura, Peng Qinke, Mukherjee Sayan, Tung Jenny, Zhou Xiang

机构信息

Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, P.R. China.

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Nucleic Acids Res. 2017 Jun 20;45(11):e106. doi: 10.1093/nar/gkx204.

DOI:10.1093/nar/gkx204
PMID:28369632
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5499851/
Abstract

Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample non-independence. Previous count-based methods rely on simple hierarchical Poisson models (e.g. negative binomial) to model independent over-dispersion, but do not account for sample non-independence due to relatedness, population structure and/or hidden confounders. Here, we present a Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample non-independence. We also develop a scalable sampling-based inference algorithm using a latent variable representation of the Poisson distribution. With simulations, we show that our method properly controls for type I error and is generally more powerful than other widely used approaches, except in small samples (n <15) with other unfavorable properties (e.g. small effect sizes). We also apply our method to three real datasets that contain related individuals, population stratification or hidden confounders. Our results show that our method increases power in all three data compared to other approaches, though the power gain is smallest in the smallest sample (n = 6). Our method is implemented in MACAU, freely available at www.xzlab.org/software.html.

摘要

从RNA测序(RNAseq)研究中识别差异表达(DE)基因是基因组学中最常见的分析之一。然而,RNAseq差异表达分析存在一些统计和计算挑战,包括过度分散的读数计数,以及在某些情况下样本的非独立性。以前基于计数的方法依赖于简单的分层泊松模型(如负二项分布)来对独立的过度分散进行建模,但没有考虑由于相关性、群体结构和/或隐藏的混杂因素导致的样本非独立性。在这里,我们提出了一个具有两个随机效应项的泊松混合模型,该模型同时考虑了独立的过度分散和样本非独立性。我们还使用泊松分布的潜在变量表示开发了一种基于采样的可扩展推理算法。通过模拟,我们表明我们的方法能够正确控制I型错误,并且除了在具有其他不利特性(如小效应量)的小样本(n <15)中,通常比其他广泛使用的方法更强大。我们还将我们的方法应用于三个包含相关个体、群体分层或隐藏混杂因素的真实数据集。我们的结果表明,与其他方法相比,我们的方法在所有三个数据集中都提高了检验效能,尽管在最小的样本(n = 6)中效能增益最小。我们的方法在MACAU中实现,可在www.xzlab.org/software.html上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/7d459e1be6d9/gkx204fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/a8f4c15c0fb2/gkx204fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/c71970161c78/gkx204fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/baa9c72e685f/gkx204fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/7d459e1be6d9/gkx204fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/a8f4c15c0fb2/gkx204fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/c71970161c78/gkx204fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/baa9c72e685f/gkx204fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c835/5499851/7d459e1be6d9/gkx204fig4.jpg

相似文献

1
Differential expression analysis for RNAseq using Poisson mixed models.使用泊松混合模型对RNA测序数据进行差异表达分析。
Nucleic Acids Res. 2017 Jun 20;45(11):e106. doi: 10.1093/nar/gkx204.
2
Performance of analytical methods for overdispersed counts in cluster randomized trials: sample size, degree of clustering and imbalance.在整群随机试验中分析过度离散计数的方法的性能:样本量、聚类程度和不均衡性。
Stat Med. 2009 Oct 30;28(24):2989-3011. doi: 10.1002/sim.3681.
3
A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data.一种用于在亚硫酸氢盐测序数据中识别差异DNA甲基化的灵活、高效二项混合模型
PLoS Genet. 2015 Nov 24;11(11):e1005650. doi: 10.1371/journal.pgen.1005650. eCollection 2015 Nov.
4
BADGE: a novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data.标记:一种用于 RNA-Seq 数据精确丰度定量和差异分析的新型贝叶斯模型。
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S6. doi: 10.1186/1471-2105-15-S9-S6. Epub 2014 Sep 10.
5
Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies.基于广义线性混合模型的基因组测序研究中计数数据的遗传力估计和差异分析。
Bioinformatics. 2019 Feb 1;35(3):487-496. doi: 10.1093/bioinformatics/bty644.
6
Performance in population models for count data, part I: maximum likelihood approximations.计数数据总体模型中的性能,第一部分:最大似然近似。
J Pharmacokinet Pharmacodyn. 2009 Aug;36(4):353-66. doi: 10.1007/s10928-009-9126-8. Epub 2009 Aug 4.
7
A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data.一种新的用于散布的收缩估计量可改善 RNA-seq 数据中的差异表达检测。
Biostatistics. 2013 Apr;14(2):232-43. doi: 10.1093/biostatistics/kxs033. Epub 2012 Sep 22.
8
A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments.一种灵活的计数数据模型,可适用于广泛复制的 RNA-seq 实验所产生的广泛多样化的表达谱。
BMC Bioinformatics. 2013 Aug 21;14:254. doi: 10.1186/1471-2105-14-254.
9
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.
10
Modeling Sage data with a truncated gamma-Poisson model.使用截断伽马-泊松模型对Sage数据进行建模。
BMC Bioinformatics. 2006 Mar 20;7:157. doi: 10.1186/1471-2105-7-157.

引用本文的文献

1
SMOPCA: spatially aware dimension reduction integrating multi-omics improves the efficiency of spatial domain detection.SMOPCA:整合多组学的空间感知降维提高了空间域检测的效率。
Genome Biol. 2025 May 21;26(1):135. doi: 10.1186/s13059-025-03576-9.
2
Polygenic prediction for underrepresented populations through transfer learning by utilizing genetic similarity shared with European populations.通过利用与欧洲人群共享的遗传相似性,借助迁移学习对代表性不足的人群进行多基因预测。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf048.
3
A review of model evaluation metrics for machine learning in genetics and genomics.

本文引用的文献

1
Physical maturation and age estimates of yellow baboons, Papio cynocephalus, in Amboseli National Park, Kenya.肯尼亚安博塞利国家公园的黄狒狒(Papio cynocephalus)的身体成熟度和年龄估计。
Am J Primatol. 1981;1(4):389-399. doi: 10.1002/ajp.1350010404.
2
The genetic regulatory signature of type 2 diabetes in human skeletal muscle.2 型糖尿病在人类骨骼肌中的遗传调控特征。
Nat Commun. 2016 Jun 29;7:11764. doi: 10.1038/ncomms11764.
3
An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies.DNA甲基化研究中细胞类型异质性校正方法的评估。
遗传学和基因组学中机器学习模型评估指标综述。
Front Bioinform. 2024 Sep 10;4:1457619. doi: 10.3389/fbinf.2024.1457619. eCollection 2024.
4
Temperature-Dependent tRNA Modifications in Bacillales.芽孢杆菌目中依赖温度的 tRNA 修饰。
Int J Mol Sci. 2024 Aug 13;25(16):8823. doi: 10.3390/ijms25168823.
5
Differential gene expression analysis based on linear mixed model corrects false positive inflation for studying quantitative traits.基于线性混合模型的差异基因表达分析可纠正研究数量性状时的假阳性膨胀。
Sci Rep. 2023 Oct 3;13(1):16570. doi: 10.1038/s41598-023-43686-7.
6
SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics.SRTsim:用于空间分辨转录组学的空间模式保持模拟。
Genome Biol. 2023 Mar 3;24(1):39. doi: 10.1186/s13059-023-02879-z.
7
A multi-view latent variable model reveals cellular heterogeneity in complex tissues for paired multimodal single-cell data.多视角潜变量模型揭示了复杂组织中配对多模态单细胞数据的细胞异质性。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad005.
8
Spatially aware dimension reduction for spatial transcriptomics.空间转录组学的空间感知降维。
Nat Commun. 2022 Nov 23;13(1):7203. doi: 10.1038/s41467-022-34879-1.
9
RNAseq Analysis of Brain Aging in Wild Specimens of Short-Lived Turquoise Killifish: Commonalities and Differences With Aging Under Laboratory Conditions.野生短生命周期的青鳉鱼大脑衰老的 RNAseq 分析:实验室条件下衰老的共性和差异。
Mol Biol Evol. 2022 Nov 3;39(11). doi: 10.1093/molbev/msac219.
10
SpatialMap: Spatial Mapping of Unmeasured Gene Expression Profiles in Spatial Transcriptomic Data Using Generalized Linear Spatial Models.空间图谱:使用广义线性空间模型对空间转录组数据中未测量的基因表达谱进行空间映射。
Front Genet. 2022 May 26;13:893522. doi: 10.3389/fgene.2022.893522. eCollection 2022.
Genome Biol. 2016 May 3;17:84. doi: 10.1186/s13059-016-0935-y.
4
A benchmark for RNA-seq quantification pipelines.RNA测序定量流程的一个基准。
Genome Biol. 2016 Apr 23;17:74. doi: 10.1186/s13059-016-0940-1.
5
Near-optimal probabilistic RNA-seq quantification.近乎最优的概率 RNA-seq 定量。
Nat Biotechnol. 2016 May;34(5):525-7. doi: 10.1038/nbt.3519. Epub 2016 Apr 4.
6
How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA测序实验需要多少生物学重复,以及应该使用哪种差异表达工具?
RNA. 2016 Jun;22(6):839-51. doi: 10.1261/rna.053959.115. Epub 2016 Mar 28.
7
Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies.稀疏主成分分析在全表观基因组关联研究中校正细胞类型异质性。
Nat Methods. 2016 May;13(5):443-5. doi: 10.1038/nmeth.3809. Epub 2016 Mar 28.
8
Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models.通过逻辑混合模型在遗传关联研究中对二元性状的群体结构和相关性进行控制。
Am J Hum Genet. 2016 Apr 7;98(4):653-66. doi: 10.1016/j.ajhg.2016.02.012. Epub 2016 Mar 24.
9
A survey of best practices for RNA-seq data analysis.RNA测序数据分析的最佳实践调查。
Genome Biol. 2016 Jan 26;17:13. doi: 10.1186/s13059-016-0881-8.
10
A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data.一种用于在亚硫酸氢盐测序数据中识别差异DNA甲基化的灵活、高效二项混合模型
PLoS Genet. 2015 Nov 24;11(11):e1005650. doi: 10.1371/journal.pgen.1005650. eCollection 2015 Nov.