Department of Medicine, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK.
MRC Biostatistics Unit, University of Cambridge, Cambridge, UK.
Genet Epidemiol. 2021 Apr;45(3):324-337. doi: 10.1002/gepi.22374. Epub 2020 Dec 28.
A transcriptome-wide association study (TWAS) attempts to identify disease associated genes by imputing gene expression into a genome-wide association study (GWAS) using an expression quantitative trait loci (eQTL) data set and then testing for associations with a trait of interest. Regulatory processes may be shared across related tissues and one natural extension of TWAS is harnessing cross-tissue correlation in gene expression to improve prediction accuracy. Here, we studied multi-tissue extensions of lasso regression and random forests (RF), joint lasso and RF-MTL (multi-task learning RF), respectively. We found that, on our chosen eQTL data set, multi-tissue methods were generally more accurate than their single-tissue counterparts, with RF-MTL performing the best. Simulations showed that these benefits generally translated into more associated genes identified, although highlighted that joint lasso had a tendency to erroneously identify genes in one tissue if there existed an eQTL signal for that gene in another. Applying the four methods to a type 1 diabetes GWAS, we found that multi-tissue methods found more unique associated genes for most of the tissues considered. We conclude that multi-tissue methods are competitive and, for some cell types, superior to single-tissue approaches and hold much promise for TWAS studies.
全转录组关联研究(TWAS)试图通过使用表达数量性状基因座(eQTL)数据集将基因表达内插到全基因组关联研究(GWAS)中,然后测试与感兴趣性状的关联,从而识别与疾病相关的基因。调控过程可能在相关组织中共享,TWAS 的一个自然延伸是利用基因表达的跨组织相关性来提高预测准确性。在这里,我们研究了lasso 回归和随机森林(RF)的多组织扩展,分别是联合lasso 和 RF-MTL(多任务学习 RF)。我们发现,在我们选择的 eQTL 数据集上,多组织方法通常比单组织方法更准确,RF-MTL 的表现最好。模拟表明,这些好处通常转化为更多相关基因的识别,尽管突出表明,如果另一个组织中存在该基因的 eQTL 信号,则联合lasso 有错误地识别一个组织中基因的倾向。将这四种方法应用于 1 型糖尿病 GWAS,我们发现对于大多数考虑的组织,多组织方法发现了更多独特的相关基因。我们得出结论,多组织方法具有竞争力,对于某些细胞类型,优于单组织方法,并且为 TWAS 研究提供了很大的希望。