Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA.
BMC Bioinformatics. 2010 Oct 28;11 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-11-S9-S2.
Identification of expression quantitative trait loci (eQTLs) is an emerging area in genomic study. The task requires an integrated analysis of genome-wide single nucleotide polymorphism (SNP) data and gene expression data, raising a new computational challenge due to the tremendous size of data.
We develop a method to identify eQTLs. The method represents eQTLs as information flux between genetic variants and transcripts. We use information theory to simultaneously interrogate SNP and gene expression data, resulting in a Transcriptional Information Map (TIM) which captures the network of transcriptional information that links genetic variations, gene expression and regulatory mechanisms. These maps are able to identify both cis- and trans- regulating eQTLs. The application on a dataset of leukemia patients identifies eQTLs in the regions of the GART, PCP4, DSCAM, and RIPK4 genes that regulate ADAMTS1, a known leukemia correlate.
The information theory approach presented in this paper is able to infer the dependence networks between SNPs and transcripts, which in turn can identify cis- and trans-eQTLs. The application of our method to the leukemia study explains how genetic variants and gene expression are linked to leukemia.
表达数量性状基因座(eQTL)的鉴定是基因组研究中的一个新兴领域。由于数据量巨大,这项任务需要对全基因组单核苷酸多态性(SNP)数据和基因表达数据进行综合分析,这带来了新的计算挑战。
我们开发了一种识别 eQTL 的方法。该方法将 eQTL 表示为遗传变异和转录本之间的信息流。我们使用信息论同时检测 SNP 和基因表达数据,从而产生转录信息图(TIM),该图捕获了连接遗传变异、基因表达和调控机制的转录信息网络。这些图谱能够识别顺式和反式调节的 eQTL。在白血病患者数据集上的应用确定了 GART、PCP4、DSCAM 和 RIPK4 基因区域中的 eQTL,这些基因区域调节 ADAMTS1,ADAMTS1 是已知的白血病相关基因。
本文提出的信息论方法能够推断 SNP 和转录本之间的依赖网络,进而识别顺式和反式 eQTL。我们方法在白血病研究中的应用解释了遗传变异和基因表达如何与白血病相关联。