Rashid Naim U, Sun Wei, Ibrahim Joseph G
University of North Carolina at Chapel Hill.
Fred Hutchinson Cancer Research Center.
Ann Appl Stat. 2016;10(4):2254-2273. doi: 10.1214/16-AOAS973. Epub 2017 Jan 5.
Sequencing techniques have been widely used to assess gene expression (i.e., RNA-seq) or the presence of epigenetic features (e.g., DNase-seq to identify open chromatin regions). In contrast to traditional microarray platforms, sequencing data are typically summarized in the form of discrete counts, and they are able to delineate allele-specific signals, which are not available from microarrays. The presence of epigenetic features are often associated with gene expression, both of which have been shown to be affected by DNA polymorphisms. However, joint models with the flexibility to assess interactions between gene expression, epigenetic features and DNA polymorphisms are currently lacking. In this paper, we develop a statistical model to assess the associations between gene expression and epigenetic features using sequencing data, while explicitly modeling the effects of DNA polymorphisms in either an allele-specific or nonallele-specific manner. We show that in doing so we provide the flexibility to detect associations between gene expression and epigenetic features, as well as conditional associations given DNA polymorphisms. We evaluate the performance of our method using simulations and apply our method to study the association between gene expression and the presence of DNase I Hypersensitive sites (DHSs) in HapMap individuals. Our model can be generalized to exploring the relationships between DNA polymorphisms and any two types of sequencing experiments, a useful feature as the variety of sequencing experiments continue to expand.
测序技术已被广泛用于评估基因表达(即RNA测序)或表观遗传特征的存在(例如,用于识别开放染色质区域的DNase测序)。与传统的微阵列平台不同,测序数据通常以离散计数的形式汇总,并且它们能够描绘等位基因特异性信号,而微阵列无法提供这些信号。表观遗传特征的存在通常与基因表达相关,两者都已被证明受DNA多态性的影响。然而,目前缺乏能够灵活评估基因表达、表观遗传特征和DNA多态性之间相互作用的联合模型。在本文中,我们开发了一种统计模型,使用测序数据评估基因表达与表观遗传特征之间的关联,同时以等位基因特异性或非等位基因特异性方式明确模拟DNA多态性的影响。我们表明,这样做可以灵活地检测基因表达与表观遗传特征之间的关联,以及给定DNA多态性情况下的条件关联。我们使用模拟评估了我们方法的性能,并将我们的方法应用于研究HapMap个体中基因表达与DNase I超敏位点(DHS)存在之间的关联。我们的模型可以推广到探索DNA多态性与任何两种类型的测序实验之间的关系,随着测序实验种类的不断扩展,这是一个有用的特性。