Cantor Rita M, Cordell Heather J
Department of Human Genetics, David Geffen School of Medicine at UCLA, 695 Charles E. Young Dr, South, Los Angeles, CA, 90024-7088, USA.
Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK.
BMC Genet. 2016 Feb 3;17 Suppl 2(Suppl 2):3. doi: 10.1186/s12863-015-0311-z.
We currently have the ability to quantify transcript abundance of messenger RNA (mRNA), genome-wide, using microarray technologies. Analyzing genotype, phenotype and expression data from 20 pedigrees, the members of our Genetic Analysis Workshop (GAW) 19 gene expression group published 9 papers, tackling some timely and important problems and questions. To study the complexity and interrelationships of genetics and gene expression, we used established statistical tools, developed newer statistical tools, and developed and applied extensions to these tools.
To study gene expression correlations in the pedigree members (without incorporating genotype or trait data into the analysis), 2 papers used principal components analysis, weighted gene coexpression network analysis, meta-analyses, gene enrichment analyses, and linear mixed models. To explore the relationship between genetics and gene expression, 2 papers studied expression quantitative trait locus allelic heterogeneity through conditional association analyses, and epistasis through interaction analyses. A third paper assessed the feasibility of applying allele-specific binding to filter potential regulatory single-nucleotide polymorphisms (SNPs). Analytic approaches included linear mixed models based on measured genotypes in pedigrees, permutation tests, and covariance kernels. To incorporate both genotype and phenotype data with gene expression, 4 groups employed linear mixed models, nonparametric weighted U statistics, structural equation modeling, Bayesian unified frameworks, and multiple regression.
Regarding the analysis of pedigree data, we found that gene expression is familial, indicating that at least 1 factor for pedigree membership or multiple factors for the degree of relationship should be included in analyses, and we developed a method to adjust for familiality prior to conducting weighted co-expression gene network analysis. For SNP association and conditional analyses, we found FaST-LMM (Factored Spectrally Transformed Linear Mixed Model) and SOLAR-MGA (Sequential Oligogenic Linkage Analysis Routines -Major Gene Analysis) have similar type 1 and type 2 errors and can be used almost interchangeably. To improve the power and precision of association tests, prior knowledge of DNase-I hypersensitivity sites or other relevant biological annotations can be incorporated into the analyses. On a biological level, eQTL (expression quantitative trait loci) are genetically complex, exhibiting both allelic heterogeneity and epistasis. Including both genotype and phenotype data together with measurements of gene expression was found to be generally advantageous in terms of generating improved levels of significance and in providing more interpretable biological models.
Pedigrees can be used to conduct analyses of and enhance gene expression studies.
目前,我们有能力利用微阵列技术在全基因组范围内对信使核糖核酸(mRNA)的转录本丰度进行定量分析。我们遗传分析研讨会(GAW)19基因表达组的成员分析了20个家系的基因型、表型和表达数据,发表了9篇论文,解决了一些及时且重要的问题。为了研究遗传学与基因表达的复杂性及相互关系,我们使用了既定的统计工具,开发了更新的统计工具,并对这些工具进行了扩展和应用。
为了研究家系成员中的基因表达相关性(分析中未纳入基因型或性状数据),有2篇论文使用了主成分分析、加权基因共表达网络分析、荟萃分析、基因富集分析和线性混合模型。为了探索遗传学与基因表达之间的关系,有2篇论文通过条件关联分析研究了表达数量性状位点等位基因异质性,并通过相互作用分析研究了上位性。第三篇论文评估了应用等位基因特异性结合来筛选潜在调控单核苷酸多态性(SNP)的可行性。分析方法包括基于家系中测量基因型的线性混合模型、置换检验和协方差核。为了将基因型和表型数据与基因表达相结合,4个研究小组采用了线性混合模型、非参数加权U统计量、结构方程模型、贝叶斯统一框架和多元回归。
关于家系数据的分析,我们发现基因表达具有家族性,这表明在分析中应至少纳入一个家系成员因素或多个关系程度因素,并且我们开发了一种在进行加权共表达基因网络分析之前针对家族性进行调整的方法。对于SNP关联和条件分析,我们发现FaST-LMM(因式谱变换线性混合模型)和SOLAR-MGA(顺序寡基因连锁分析程序 - 主基因分析)具有相似的I型和II型错误,几乎可以互换使用。为了提高关联检验的效能和精度,可以将DNase-I超敏位点或其他相关生物学注释的先验知识纳入分析。在生物学层面,表达数量性状位点(eQTL)在遗传上是复杂的,表现出等位基因异质性和上位性。发现将基因型和表型数据与基因表达测量相结合,在提高显著性水平和提供更具可解释性的生物学模型方面通常具有优势。
家系可用于进行基因表达研究的分析并加强此类研究。