Department of Statistics, Iowa State University.
Department of Statistics, Iowa State University, Department of Veterinary Diagnostic and Production Animal Medicine and.
Bioinformatics. 2016 Jun 1;32(11):1701-8. doi: 10.1093/bioinformatics/btw061. Epub 2016 Feb 1.
Transposon insertion sequencing (Tn-seq) is an emerging technology that combines transposon mutagenesis with next-generation sequencing technologies for the identification of genes related to bacterial survival. The resulting data from Tn-seq experiments consist of sequence reads mapped to millions of potential transposon insertion sites and a large portion of insertion sites have zero mapped reads. Novel statistical method for Tn-seq data analysis is needed to infer functions of genes on bacterial growth.
In this article, we propose a zero-inflated Poisson model for analyzing the Tn-seq data that are high-dimensional and with an excess of zeros. Maximum likelihood estimates of model parameters are obtained using an expectation-maximization (EM) algorithm, and pseudogenes are utilized to construct appropriate statistical tests for the transposon insertion tolerance of normal genes of interest. We propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant and hyper-tolerant, while controlling the false discovery rate. We evaluate the proposed method with simulation studies and apply the proposed method to a real Tn-seq data from an experiment that studied the bacterial pathogen, Campylobacter jejuniAvailability and implementation: We provide R code for implementing our proposed method at http://github.com/ffliu/TnSeq A user's guide with example data analysis is also available there.
Supplementary data are available at Bioinformatics online.
转座子插入测序(Tn-seq)是一种新兴的技术,它将转座子诱变与下一代测序技术相结合,用于鉴定与细菌生存相关的基因。Tn-seq 实验产生的数据包括映射到数百万个潜在转座子插入位点的序列读数,并且很大一部分插入位点没有映射的读数。需要新的统计方法来分析 Tn-seq 数据,以推断细菌生长中基因的功能。
在本文中,我们提出了一种用于分析 Tn-seq 数据的零膨胀泊松模型,这些数据具有高维性和大量的零值。使用期望最大化(EM)算法获得模型参数的最大似然估计,并且利用假基因来构建适当的统计检验,用于正常基因的转座子插入容忍度。我们提出了一种多重检验程序,将基因分类为三种状态,即低容忍度、容忍度和高容忍度,同时控制假发现率。我们通过模拟研究评估了所提出的方法,并将所提出的方法应用于研究细菌病原体空肠弯曲菌的真实 Tn-seq 数据。在那里还提供了用于实现我们提出的方法的 R 代码以及带有示例数据分析的用户指南。
补充资料可在《生物信息学》在线获取。