Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA.
BMC Med Genomics. 2010 May 6;3:17. doi: 10.1186/1755-8794-3-17.
Many common diseases arise from an interaction between environmental and genetic factors. Our knowledge regarding environment and gene interactions is growing, but frameworks to build an association between gene-environment interactions and disease using preexisting, publicly available data has been lacking. Integrating freely-available environment-gene interaction and disease phenotype data would allow hypothesis generation for potential environmental associations to disease.
We integrated publicly available disease-specific gene expression microarray data and curated chemical-gene interaction data to systematically predict environmental chemicals associated with disease. We derived chemical-gene signatures for 1,338 chemical/environmental chemicals from the Comparative Toxicogenomics Database (CTD). We associated these chemical-gene signatures with differentially expressed genes from datasets found in the Gene Expression Omnibus (GEO) through an enrichment test.
We were able to verify our analytic method by accurately identifying chemicals applied to samples and cell lines. Furthermore, we were able to predict known and novel environmental associations with prostate, lung, and breast cancers, such as estradiol and bisphenol A.
We have developed a scalable and statistical method to identify possible environmental associations with disease using publicly available data and have validated some of the associations in the literature.
许多常见疾病是由环境和遗传因素相互作用引起的。我们对环境和基因相互作用的了解在不断增加,但缺乏利用现有公开数据建立基因-环境相互作用与疾病之间关联的框架。整合免费的环境-基因相互作用和疾病表型数据,可以为潜在的环境与疾病关联生成假说。
我们整合了公开的疾病特异性基因表达微阵列数据和经过精心整理的化学-基因相互作用数据,以系统地预测与疾病相关的环境化学物质。我们从比较毒理学基因组数据库(CTD)中得出了 1338 种化学物质/环境化学物质的化学-基因特征。我们通过富集测试将这些化学-基因特征与基因表达综合数据库(GEO)中找到的数据集的差异表达基因相关联。
我们能够通过准确识别应用于样本和细胞系的化学物质来验证我们的分析方法。此外,我们还能够预测已知和新的环境与前列腺癌、肺癌和乳腺癌的关联,如雌二醇和双酚 A。
我们已经开发了一种可扩展的统计方法,利用公开数据识别与疾病相关的潜在环境关联,并验证了文献中的一些关联。