Suppr超能文献

用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。

Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.

机构信息

Department of Statistics and Applied Probability, National University of Singapore, Singapore.

Department of Statistics, Oregon State University, Corvallis, Oregon.

出版信息

Genet Epidemiol. 2019 Oct;43(7):786-799. doi: 10.1002/gepi.22246. Epub 2019 Jul 22.

Abstract

RNA sequencing (RNA-Seq) has been frequently used in genomic studies and has generated a vast amount of data. The RNA-Seq data are composed of two parts: (a) a sequence of nucleotides of the genome; and (b) a corresponding sequence of counts, standing for the number of short reads whose mapped positions start at each position of the genome. One common feature of these count data is that they are typically nonuniform; recent studies have revealed that the nonuniformity is partially owing to a systematic bias resulted from the sequencing preference. Existing works in the literature model the nonuniformity with a single component Poisson linear model that incorporates the effects of the sequencing preference. However, we observe consistently that the short reads mapped to a gene may have a mixture structure and can be zero-inflated. A single component model may not suffice to model the complexity of such data. In this paper, we propose a zero-inflated mixture Poisson linear model for the RNA-Seq count data and derive a fast expectation-maximisation-based algorithm for estimating the unknown parameters. Numerical studies are conducted to illustrate the effectiveness of our method.

摘要

RNA 测序(RNA-Seq)已被广泛应用于基因组学研究,并产生了大量的数据。RNA-Seq 数据由两部分组成:(a)基因组的核苷酸序列;(b)对应于计数的序列,表示映射到基因组每个位置的短读取的数量。这些计数数据的一个共同特征是它们通常是非均匀的;最近的研究表明,这种非均匀性部分是由于测序偏好导致的系统偏差。文献中的现有工作使用单一成分泊松线性模型来对非均匀性进行建模,该模型纳入了测序偏好的影响。然而,我们一致观察到,映射到一个基因的短读取可能具有混合结构并且可能为零膨胀。单一成分模型可能不足以对这些数据的复杂性进行建模。在本文中,我们提出了一种用于 RNA-Seq 计数数据的零膨胀混合泊松线性模型,并推导了一种基于快速期望最大化的算法来估计未知参数。进行了数值研究以说明我们方法的有效性。

相似文献

8
Zero-inflated Poisson models with measurement error in the response.带有响应测量误差的零膨胀泊松模型。
Biometrics. 2023 Jun;79(2):1089-1102. doi: 10.1111/biom.13657. Epub 2022 Apr 20.

本文引用的文献

5
Novel fusion transcripts in bladder cancer identified by RNA-seq.通过RNA测序鉴定出的膀胱癌中的新型融合转录本。
Cancer Lett. 2016 May 1;374(2):224-8. doi: 10.1016/j.canlet.2016.02.010. Epub 2016 Feb 16.
9
GC-content normalization for RNA-Seq data.RNA-Seq 数据的 GC 含量归一化。
BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.
10
Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。
BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验