Bergen Andrew W, Baccarelli Andrea, McDaniel Timothy K, Kuhn Kenneth, Pfeiffer Ruth, Kakol Jerry, Bender Patrick, Jacobs Kevin, Packer Bernice, Chanock Stephen J, Yeager Meredith
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA.
BMC Genomics. 2007 Aug 29;8:296. doi: 10.1186/1471-2164-8-296.
Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics) provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects) in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature.
We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning) to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p < 0.05) association with gene expression. Using the literature as a "gold standard" to compare 14 genes with data from both this study and the literature, we observed 80% and 85% concordance for genes exhibiting and not exhibiting significant cis sequence effects in our study, respectively.
Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.
个体内部和个体之间的序列变异与转录变异通常是独立研究的。序列变异与基因表达变异的联合分析(遗传基因组学)有助于深入了解连锁序列变异在基因表达调控中的作用。我们在一组常用于癌症研究的淋巴细胞系基因中,研究了顺式序列变异对基因表达的作用(顺式序列效应)。我们使用三种不同的分析方法,估计了表现出顺式序列效应的基因比例,以及顺式序列效应所解释的基因表达变异比例,并将我们的结果与文献进行了比较。
在本研究中,我们从30个淋巴细胞系中生成了697个候选基因的基因表达谱数据,并使用552个候选基因的可用候选基因重测序数据,以确定在两个数据集中均具有足够变异的30个候选基因,用于研究顺式序列效应。我们使用两个加性模型和Templeton的单倍型系统发育扫描方法(树形扫描),来评估单个单核苷酸多态性(SNP)、一个基因上的所有SNP以及双倍型与对数转换后的基因表达之间的关联。八个候选基因的SNP和双倍型与基因表达表现出统计学上的显著关联(p < 0.05)。以文献作为“金标准”,对14个在本研究和文献中均有数据的基因进行比较,我们发现,在本研究中表现出和顺式序列效应不显著的基因,分别有80%和85%的一致性。
基于对我们的结果和现有文献的分析,四分之一的基因表现出显著的顺式序列效应,对于这些基因,约30%的基因表达变异由顺式序列变异所致。尽管实验方法多样,但此前发表的研究在很大程度上支持了显著顺式序列效应的存在与否。