Department of Biochemistry & Molecular Biology, University of Calgary.
Department of Mathematics & Statistics, University of Calgary.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa270.
The power of genotype-phenotype association mapping studies increases greatly when contributions from multiple variants in a focal region are meaningfully aggregated. Currently, there are two popular categories of variant aggregation methods. Transcriptome-wide association studies (TWAS) represent a set of emerging methods that select variants based on their effect on gene expressions, providing pretrained linear combinations of variants for downstream association mapping. In contrast to this, kernel methods such as sequence kernel association test (SKAT) model genotypic and phenotypic variance use various kernel functions that capture genetic similarity between subjects, allowing nonlinear effects to be included. From the perspective of machine learning, these two methods cover two complementary aspects of feature engineering: feature selection/pruning and feature aggregation. Thus far, no thorough comparison has been made between these categories, and no methods exist which incorporate the advantages of TWAS- and kernel-based methods. In this work, we developed a novel method called kernel-based TWAS (kTWAS) that applies TWAS-like feature selection to a SKAT-like kernel association test, combining the strengths of both approaches. Through extensive simulations, we demonstrate that kTWAS has higher power than TWAS and multiple SKAT-based protocols, and we identify novel disease-associated genes in Wellcome Trust Case Control Consortium genotyping array data and MSSNG (Autism) sequence data. The source code for kTWAS and our simulations are available in our GitHub repository (https://github.com/theLongLab/kTWAS).
当对焦点区域中的多个变体的贡献进行有意义的聚合时,基因型-表型关联映射研究的能力大大增强。目前,有两种流行的变体聚合方法类别。全转录组关联研究(TWAS)代表了一组新兴方法,这些方法基于变体对基因表达的影响选择变体,为下游关联映射提供了预先训练的变体线性组合。与此相反,序列核关联测试(SKAT)等核方法模型基因型和表型方差使用各种核函数来捕获受试者之间的遗传相似性,从而允许包含非线性效应。从机器学习的角度来看,这两种方法涵盖了特征工程的两个互补方面:特征选择/修剪和特征聚合。到目前为止,还没有对这两类方法进行彻底的比较,也没有方法可以结合 TWAS 和基于核的方法的优点。在这项工作中,我们开发了一种名为基于核的 TWAS(kTWAS)的新方法,该方法将 TWAS 类的特征选择应用于 SKAT 类的核关联测试,从而结合了这两种方法的优势。通过广泛的模拟,我们证明 kTWAS 比 TWAS 和多个基于 SKAT 的协议具有更高的功效,并且我们在韦尔科姆信托基金病例对照联盟基因分型阵列数据和 MSSNG(自闭症)序列数据中确定了新的疾病相关基因。kTWAS 和我们的模拟的源代码可在我们的 GitHub 存储库中获得(https://github.com/theLongLab/kTWAS)。