用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。

Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.

机构信息

Department of Statistics and Applied Probability, National University of Singapore, Singapore.

Department of Statistics, Oregon State University, Corvallis, Oregon.

出版信息

Genet Epidemiol. 2019 Oct;43(7):786-799. doi: 10.1002/gepi.22246. Epub 2019 Jul 22.

DOI:10.1002/gepi.22246

PMID:31328831

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6763381/

Abstract

RNA sequencing (RNA-Seq) has been frequently used in genomic studies and has generated a vast amount of data. The RNA-Seq data are composed of two parts: (a) a sequence of nucleotides of the genome; and (b) a corresponding sequence of counts, standing for the number of short reads whose mapped positions start at each position of the genome. One common feature of these count data is that they are typically nonuniform; recent studies have revealed that the nonuniformity is partially owing to a systematic bias resulted from the sequencing preference. Existing works in the literature model the nonuniformity with a single component Poisson linear model that incorporates the effects of the sequencing preference. However, we observe consistently that the short reads mapped to a gene may have a mixture structure and can be zero-inflated. A single component model may not suffice to model the complexity of such data. In this paper, we propose a zero-inflated mixture Poisson linear model for the RNA-Seq count data and derive a fast expectation-maximisation-based algorithm for estimating the unknown parameters. Numerical studies are conducted to illustrate the effectiveness of our method.

摘要

RNA 测序（RNA-Seq）已被广泛应用于基因组学研究，并产生了大量的数据。RNA-Seq 数据由两部分组成：（a）基因组的核苷酸序列；（b）对应于计数的序列，表示映射到基因组每个位置的短读取的数量。这些计数数据的一个共同特征是它们通常是非均匀的；最近的研究表明，这种非均匀性部分是由于测序偏好导致的系统偏差。文献中的现有工作使用单一成分泊松线性模型来对非均匀性进行建模，该模型纳入了测序偏好的影响。然而，我们一致观察到，映射到一个基因的短读取可能具有混合结构并且可能为零膨胀。单一成分模型可能不足以对这些数据的复杂性进行建模。在本文中，我们提出了一种用于 RNA-Seq 计数数据的零膨胀混合泊松线性模型，并推导了一种基于快速期望最大化的算法来估计未知参数。进行了数值研究以说明我们方法的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。

Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。

Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献