Lan Hong, Stoehr Jonathan P, Nadler Samuel T, Schueler Kathryn L, Yandell Brian S, Attie Alan D
Department of Biochemistry, University of Wisconsin, 433 Babcock Drive, Madison, WI 53706, USA..
Genetics. 2003 Aug;164(4):1607-14. doi: 10.1093/genetics/164.4.1607.
The advent of sophisticated genomic techniques for gene mapping and microarray analysis has provided opportunities to map mRNA abundance to quantitative trait loci (QTL) throughout the genome. Unfortunately, simple mapping of each individual mRNA trait on the scale of a typical microarray experiment is computationally intensive, subject to high sample variance, and therefore underpowered. However, this problem can be addressed by capitalizing on correlation among the large number of mRNA traits. We present a method to reduce the dimensionality for mapping gene expression data as quantitative traits. We used a blind method, principal components, and a sighted method, hierarchical clustering seeded by disease relevant traits, to define new traits composed of a small collection of promising mRNAs. We validated the principle of our approach by mapping the expression levels of metabolism genes in a population of F(2)-ob/ob mice derived from the BTBR and C57BL/6J strains. We found that lipogenic and gluconeogenic mRNAs, which are known targets of insulin action, were closely associated with the insulin trait. Multiple interval mapping and Bayesian interval mapping of this new trait revealed significant linkages to chromosome regions that were contained in loci associated with type 2 diabetes in this same mouse sample. As a further statistical refinement, we show that principal component analysis also effectively reduced dimensions for mapping phenotypes composed of mRNA abundances.
用于基因定位和微阵列分析的先进基因组技术的出现,为在全基因组范围内将mRNA丰度定位到数量性状基因座(QTL)提供了机会。不幸的是,在典型微阵列实验规模上对每个单独的mRNA性状进行简单定位计算量很大,易受高样本方差影响,因此功效不足。然而,利用大量mRNA性状之间的相关性可以解决这个问题。我们提出了一种将基因表达数据作为数量性状进行定位时降低维度的方法。我们使用了一种盲法——主成分分析,以及一种有针对性的方法——由疾病相关性状引导的层次聚类,来定义由一小部分有前景的mRNA组成的新性状。我们通过在源自BTBR和C57BL/6J品系的F(2)-ob/ob小鼠群体中定位代谢基因的表达水平,验证了我们方法的原理。我们发现,作为胰岛素作用已知靶点的生脂和糖异生mRNA与胰岛素性状密切相关。对这个新性状进行多重区间定位和贝叶斯区间定位,揭示了与同一样本中与2型糖尿病相关的基因座中包含的染色体区域有显著连锁关系。作为进一步的统计优化,我们表明主成分分析也有效地降低了由mRNA丰度组成的表型定位的维度。