Suppr超能文献

整合基因组学:从多种高通量数据来源量化表型-基因型关系的意义。

Integrative genomics: quantifying significance of phenotype-genotype relationships from multiple sources of high-throughput data.

机构信息

Department of Medicine, University of Chicago Chicago, IL, USA.

出版信息

Front Genet. 2013 May 31;3:202. doi: 10.3389/fgene.2012.00202. eCollection 2012.

Abstract

Given recent advances in the generation of high-throughput data such as whole-genome genetic variation and transcriptome expression, it is critical to come up with novel methods to integrate these heterogeneous datasets and to assess the significance of identified phenotype-genotype relationships. Recent studies show that genome-wide association findings are likely to fall in loci with gene regulatory effects such as expression quantitative trait loci (eQTLs), demonstrating the utility of such integrative approaches. When genotype and gene expression data are available on the same individuals, we and others developed methods wherein top phenotype-associated genetic variants are prioritized if they are associated, as eQTLs, with gene expression traits that are themselves associated with the phenotype. Yet there has been no method to determine an overall p-value for the findings that arise specifically from the integrative nature of the approach. We propose a computationally feasible permutation method that accounts for the assimilative nature of the method and the correlation structure among gene expression traits and among genotypes. We apply the method to data from a study of cellular sensitivity to etoposide, one of the most widely used chemotherapeutic drugs. To our knowledge, this study is the first statistically sound quantification of the overall significance of the genotype-phenotype relationships resulting from applying an integrative approach. This method can be easily extended to cases in which gene expression data are replaced by other molecular phenotypes of interest, e.g., microRNA or proteomic data. This study has important implications for studies seeking to expand on genetic association studies by the use of omics data. Finally, we provide an R code to compute the empirical false discovery rate when p-values for the observed and simulated phenotypes are available.

摘要

鉴于全基因组遗传变异和转录组表达等高通量数据的生成方面的最新进展,必须开发新的方法来整合这些异构数据集,并评估所确定的表型-基因型关系的显著性。最近的研究表明,全基因组关联研究结果很可能落在具有基因调控效应的基因座上,例如表达数量性状基因座(eQTLs),这证明了这种综合方法的实用性。当基因型和基因表达数据可用于同一个体时,我们和其他人开发了一种方法,如果与表型相关的遗传变异与自身与表型相关的基因表达性状相关(作为 eQTL),则优先考虑与表型关联的最高表型相关遗传变异。然而,对于这种综合方法的综合性质所产生的发现,还没有一种方法可以确定总体 p 值。我们提出了一种计算上可行的置换方法,该方法考虑了方法的同化性质以及基因表达性状和基因型之间的相关结构。我们将该方法应用于细胞对依托泊苷敏感性的研究数据,依托泊苷是最广泛使用的化疗药物之一。据我们所知,这是首次对应用综合方法得出的基因型-表型关系的总体显著性进行统计合理的量化。该方法可以很容易地扩展到用其他感兴趣的分子表型(例如 microRNA 或蛋白质组学数据)替代基因表达数据的情况。这项研究对于通过使用组学数据扩展遗传关联研究的研究具有重要意义。最后,当观察到和模拟的表型的 p 值可用时,我们提供了一个计算经验错误发现率的 R 代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ca0/3668276/ef895171a232/fgene-03-00202-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验