Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA.
Bioinformatics. 2022 Mar 4;38(6):1560-1567. doi: 10.1093/bioinformatics/btab851.
Kernel-based association test (KAT) has been a popular approach to evaluate the association of expressions of a gene set (e.g. pathway) with a phenotypic trait. KATs rely on kernel functions which capture the sample similarity across multiple features, to capture potential linear or non-linear relationship among features in a gene set. When calculating the kernel functions, no network graphical information about the features is considered. While genes in a functional group (e.g. a pathway) are not independent in general due to regulatory interactions, incorporating regulatory network (or graph) information can potentially increase the power of KAT. In this work, we propose a graph-embedded kernel association test, termed gKAT. gKAT incorporates prior pathway knowledge when constructing a kernel function into hypothesis testing.
We apply a diffusion kernel to capture any graph structures in a gene set, then incorporate such information to build a kernel function for further association test. We illustrate the geometric meaning of the approach. Through extensive simulation studies, we show that the proposed gKAT algorithm can improve testing power compared to the one without considering graph structures. Application to a real dataset further demonstrate the utility of the method.
The R code used for the analysis can be accessed at https://github.com/JialinQu/gKAT.
Supplementary data are available at Bioinformatics online.
基于核的关联检验(KAT)已经成为评估基因集(例如途径)的表达与表型特征之间关联的一种流行方法。KAT 依赖于核函数,这些核函数捕获了多个特征之间的样本相似性,以捕获基因集中特征之间潜在的线性或非线性关系。在计算核函数时,不考虑特征的网络图形信息。虽然由于调控相互作用,功能组(例如途径)中的基因通常不是独立的,但纳入调控网络(或图)信息可以潜在地提高 KAT 的功效。在这项工作中,我们提出了一种基于图的核关联检验,称为 gKAT。gKAT 在构建核函数进行假设检验时,将先验途径知识纳入其中。
我们应用扩散核来捕获基因集中的任何图结构,然后将这些信息纳入构建核函数进行进一步的关联检验。我们说明了这种方法的几何意义。通过广泛的模拟研究,我们表明,与不考虑图结构的方法相比,所提出的 gKAT 算法可以提高检验功效。在真实数据集上的应用进一步证明了该方法的实用性。
可在 https://github.com/JialinQu/gKAT 访问用于分析的 R 代码。
补充数据可在生物信息学在线获得。