Tsalenko Anya, Sharan Roded, Edvardsen Hege, Kristensen Vessela, Børresen-Dale Anne-Lise, Ben-Dor Amir, Yakhini Zohar
Agilent Technologies, 3500 Deer Creek Road, Palo Alto, CA 94304, USA.
Proc IEEE Comput Syst Bioinform Conf. 2005:135-43. doi: 10.1109/csb.2005.14.
High throughput expression profiling and genotyping technologies provide the means to study the genetic determinants of population variation in gene expression variation. In this paper we present a general statistical framework for the simultaneous analysis of gene expression data and SNP genotype data measured for the same cohort. The framework consists of methods to associate transcripts with SNPs affecting their expression, algorithms to detect subsets of transcripts that share significantly many associations with a subset of SNPs, and methods to visualize the identified relations. We apply our framework to SNP-expression data collected from 49 breast cancer patients. Our results demonstrate an overabundance of transcript-SNP associations in this data, and pinpoint SNPs that are potential master regulators of transcription. We also identify several statistically significant transcript-subsets with common putative regulators that fall into well-defined functional categories.
高通量表达谱分析和基因分型技术为研究基因表达变异中群体变异的遗传决定因素提供了手段。在本文中,我们提出了一个通用的统计框架,用于同时分析同一队列中测量的基因表达数据和单核苷酸多态性(SNP)基因型数据。该框架包括将转录本与影响其表达的SNP关联的方法、检测与SNP子集共享大量显著关联的转录本子集的算法,以及可视化所识别关系的方法。我们将我们的框架应用于从49名乳腺癌患者收集的SNP-表达数据。我们的结果表明该数据中转录本-SNP关联过多,并确定了作为潜在转录主调控因子的SNP。我们还识别了几个具有共同假定调控因子的统计学上显著的转录本子集,这些调控因子属于明确的功能类别。