用于表达数量性状基因座（eQTL）定位的统计方法。

Statistical methods for expression quantitative trait loci (eQTL) mapping.

作者信息

Kendziorski C M, Chen M, Yuan M, Lan H, Attie A D

机构信息

Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53703, USA.

出版信息

Biometrics. 2006 Mar;62(1):19-27. doi: 10.1111/j.1541-0420.2005.00437.x.

DOI:10.1111/j.1541-0420.2005.00437.x

PMID:16542225

Abstract

Traditional genetic mapping has largely focused on the identification of loci affecting one, or at most a few, complex traits. Microarrays allow for measurement of thousands of gene expression abundances, themselves complex traits, and a number of recent investigations have considered these measurements as phenotypes in mapping studies. Combining traditional quantitative trait loci (QTL) mapping methods with microarray data is a powerful approach with demonstrated utility in a number of recent biological investigations. These expression quantitative trait loci (eQTL) studies are similar to traditional QTL studies, as a main goal is to identify the genomic locations to which the expression traits are linked. However, eQTL studies probe thousands of expression transcripts; and as a result, standard multi-trait QTL mapping methods, designed to handle at most tens of traits, do not directly apply. One possible approach is to use single-trait QTL mapping methods to analyze each transcript separately. This leads to an increased number of false discoveries, as corrections for multiple tests across transcripts are not made. Similarly, the repeated application, at each marker, of methods for identifying differentially expressed transcripts suffers from multiple tests across markers. Here, we demonstrate the deficiencies of these approaches and propose a mixture over markers (MOM) model that shares information across both markers and transcripts. The utility of all methods is evaluated using simulated data as well as data from an F(2) mouse cross in a study of diabetes. Results from simulation studies indicate that the MOM model is best at controlling false discoveries, without sacrificing power. The MOM model is also the only one capable of finding two genome regions previously shown to be involved in diabetes.

摘要

传统的基因定位主要集中于识别影响一种或至多几种复杂性状的基因座。微阵列可用于测量数千种基因表达丰度，这些丰度本身就是复杂性状，并且最近的一些研究已将这些测量值视为定位研究中的表型。将传统的数量性状基因座（QTL）定位方法与微阵列数据相结合是一种强大的方法，在最近的一些生物学研究中已证明其效用。这些表达数量性状基因座（eQTL）研究与传统的QTL研究相似，因为主要目标是识别与表达性状相关的基因组位置。然而，eQTL研究探测数千种表达转录本；因此，旨在处理至多几十个性状的标准多性状QTL定位方法并不直接适用。一种可能的方法是使用单性状QTL定位方法分别分析每个转录本。这会导致错误发现的数量增加，因为没有对跨转录本的多重检验进行校正。同样，在每个标记处重复应用识别差异表达转录本的方法也存在跨标记的多重检验问题。在这里，我们证明了这些方法的不足之处，并提出了一种跨标记混合（MOM）模型，该模型可在标记和转录本之间共享信息。所有方法的效用均使用模拟数据以及来自一项糖尿病研究中F(2)小鼠杂交的数据进行评估。模拟研究结果表明，MOM模型在控制错误发现方面表现最佳，且不牺牲功效。MOM模型也是唯一能够找到先前已证明与糖尿病相关的两个基因组区域的模型。