• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。

Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.

机构信息

Department of Statistics and Applied Probability, National University of Singapore, Singapore.

Department of Statistics, Oregon State University, Corvallis, Oregon.

出版信息

Genet Epidemiol. 2019 Oct;43(7):786-799. doi: 10.1002/gepi.22246. Epub 2019 Jul 22.

DOI:10.1002/gepi.22246
PMID:31328831
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6763381/
Abstract

RNA sequencing (RNA-Seq) has been frequently used in genomic studies and has generated a vast amount of data. The RNA-Seq data are composed of two parts: (a) a sequence of nucleotides of the genome; and (b) a corresponding sequence of counts, standing for the number of short reads whose mapped positions start at each position of the genome. One common feature of these count data is that they are typically nonuniform; recent studies have revealed that the nonuniformity is partially owing to a systematic bias resulted from the sequencing preference. Existing works in the literature model the nonuniformity with a single component Poisson linear model that incorporates the effects of the sequencing preference. However, we observe consistently that the short reads mapped to a gene may have a mixture structure and can be zero-inflated. A single component model may not suffice to model the complexity of such data. In this paper, we propose a zero-inflated mixture Poisson linear model for the RNA-Seq count data and derive a fast expectation-maximisation-based algorithm for estimating the unknown parameters. Numerical studies are conducted to illustrate the effectiveness of our method.

摘要

RNA 测序(RNA-Seq)已被广泛应用于基因组学研究,并产生了大量的数据。RNA-Seq 数据由两部分组成:(a)基因组的核苷酸序列;(b)对应于计数的序列,表示映射到基因组每个位置的短读取的数量。这些计数数据的一个共同特征是它们通常是非均匀的;最近的研究表明,这种非均匀性部分是由于测序偏好导致的系统偏差。文献中的现有工作使用单一成分泊松线性模型来对非均匀性进行建模,该模型纳入了测序偏好的影响。然而,我们一致观察到,映射到一个基因的短读取可能具有混合结构并且可能为零膨胀。单一成分模型可能不足以对这些数据的复杂性进行建模。在本文中,我们提出了一种用于 RNA-Seq 计数数据的零膨胀混合泊松线性模型,并推导了一种基于快速期望最大化的算法来估计未知参数。进行了数值研究以说明我们方法的有效性。

相似文献

1
Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。
Genet Epidemiol. 2019 Oct;43(7):786-799. doi: 10.1002/gepi.22246. Epub 2019 Jul 22.
2
On performance of parametric and distribution-free models for zero-inflated and over-dispersed count responses.关于零膨胀和过度分散计数响应的参数模型和非参数模型的性能。
Stat Med. 2015 Oct 30;34(24):3235-45. doi: 10.1002/sim.6560. Epub 2015 Jun 15.
3
Classifying next-generation sequencing data using a zero-inflated Poisson model.使用零膨胀泊松模型对下一代测序数据进行分类。
Bioinformatics. 2018 Apr 15;34(8):1329-1335. doi: 10.1093/bioinformatics/btx768.
4
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.
5
A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.一种用于通过RNA测序对基因表达发育模式进行聚类的多泊松动态混合模型。
Brief Bioinform. 2015 Mar;16(2):205-15. doi: 10.1093/bib/bbu013. Epub 2014 May 10.
6
Zero-inflated Poisson factor model with application to microbiome read counts.零膨胀泊松因子模型及其在微生物组读频数中的应用。
Biometrics. 2021 Mar;77(1):91-101. doi: 10.1111/biom.13272. Epub 2020 May 4.
7
Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data.为下一代测序数据的小样本选择分类方法。
Front Genet. 2021 Mar 4;12:642227. doi: 10.3389/fgene.2021.642227. eCollection 2021.
8
Zero-inflated Poisson models with measurement error in the response.带有响应测量误差的零膨胀泊松模型。
Biometrics. 2023 Jun;79(2):1089-1102. doi: 10.1111/biom.13657. Epub 2022 Apr 20.
9
Generalized partially linear single-index model for zero-inflated count data.零膨胀计数数据的广义部分线性单指标模型
Stat Med. 2015 Feb 28;34(5):876-86. doi: 10.1002/sim.6382. Epub 2014 Nov 25.
10
Bayesian interval mapping of count trait loci based on zero-inflated generalized Poisson regression model.基于零膨胀广义泊松回归模型的计数性状位点的贝叶斯区间映射。
Biom J. 2020 Oct;62(6):1428-1442. doi: 10.1002/bimj.201900274. Epub 2020 May 12.

引用本文的文献

1
Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data.随机分位数残差在诊断零膨胀广义线性混合模型中的应用——以微生物组计数数据为例。
BMC Bioinformatics. 2021 Nov 25;22(1):564. doi: 10.1186/s12859-021-04371-6.
2
Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision.针对 (sc)RNA-seq 的反偏差训练:提高精度的实验和计算方法。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab148.

本文引用的文献

1
Classifying next-generation sequencing data using a zero-inflated Poisson model.使用零膨胀泊松模型对下一代测序数据进行分类。
Bioinformatics. 2018 Apr 15;34(8):1329-1335. doi: 10.1093/bioinformatics/btx768.
2
Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.混合模型揭示了RNA测序数据中的多种位置偏差类型,并能准确估计转录本浓度。
PLoS Comput Biol. 2017 May 15;13(5):e1005515. doi: 10.1371/journal.pcbi.1005515. eCollection 2017 May.
3
Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.RNA测序片段序列偏差的建模可减少转录本丰度估计中的系统误差。
Nat Biotechnol. 2016 Dec;34(12):1287-1291. doi: 10.1038/nbt.3682. Epub 2016 Sep 26.
4
Transcriptome analysis of paired primary colorectal carcinoma and liver metastases reveals fusion transcripts and similar gene expression profiles in primary carcinoma and liver metastases.配对的原发性结直肠癌和肝转移瘤的转录组分析揭示了原发性癌和肝转移瘤中的融合转录本及相似的基因表达谱。
BMC Cancer. 2016 Jul 26;16:539. doi: 10.1186/s12885-016-2596-3.
5
Novel fusion transcripts in bladder cancer identified by RNA-seq.通过RNA测序鉴定出的膀胱癌中的新型融合转录本。
Cancer Lett. 2016 May 1;374(2):224-8. doi: 10.1016/j.canlet.2016.02.010. Epub 2016 Feb 16.
6
A zero-inflated Poisson model for insertion tolerance analysis of genes based on Tn-seq data.基于 Tn-seq 数据的基因插入容忍性分析的零膨胀泊松模型。
Bioinformatics. 2016 Jun 1;32(11):1701-8. doi: 10.1093/bioinformatics/btw061. Epub 2016 Feb 1.
7
Modeling overdispersion heterogeneity in differential expression analysis using mixtures.在差异表达分析中使用混合模型对过度离散异质性进行建模。
Biometrics. 2016 Sep;72(3):804-14. doi: 10.1111/biom.12458. Epub 2015 Dec 18.
8
Phloroglucinol functions as an intracellular and intercellular chemical messenger influencing gene expression in Pseudomonas protegens.间苯三酚作为一种细胞内和细胞间的化学信使,影响着绿针假单胞菌中的基因表达。
Environ Microbiol. 2016 Oct;18(10):3296-3308. doi: 10.1111/1462-2920.13043. Epub 2015 Oct 14.
9
GC-content normalization for RNA-Seq data.RNA-Seq 数据的 GC 含量归一化。
BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.
10
Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。
BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.