Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America.
Syneos Health, Morrisville, North Carolina, United States of America.
PLoS Comput Biol. 2021 Oct 22;17(10):e1008986. doi: 10.1371/journal.pcbi.1008986. eCollection 2021 Oct.
High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the "-omics" family. For this work, we focus on subsets that interact with one another and represent these "pathways" as graphs. Observed pathways often have disjoint components, i.e., nodes or sets of nodes (metabolites, etc.) not connected to any other within the pathway, which notably lessens testing power. In this paper we propose the Pathway Integrated Regression-based Kernel Association Test (PaIRKAT), a new kernel machine regression method for incorporating known pathway information into the semi-parametric kernel regression framework. This work extends previous kernel machine approaches. This paper also contributes an application of a graph kernel regularization method for overcoming disconnected pathways. By incorporating a regularized or "smoothed" graph into a score test, PaIRKAT can provide more powerful tests for associations between biological pathways and phenotypes of interest and will be helpful in identifying novel pathways for targeted clinical research. We evaluate this method through several simulation studies and an application to real metabolomics data from the COPDGene study. Our simulation studies illustrate the robustness of this method to incorrect and incomplete pathway knowledge, and the real data analysis shows meaningful improvements of testing power in pathways. PaIRKAT was developed for application to metabolomic pathway data, but the techniques are easily generalizable to other data sources with a graph-like structure.
高通量数据,如代谢组学、基因组学、转录组学和蛋白质组学,已经成为“组学”家族中熟悉的数据类型。在这项工作中,我们专注于相互作用的子集,并将这些“途径”表示为图。观察到的途径通常具有不相交的组件,即与途径内的任何其他途径没有连接的节点或节点集(代谢物等),这显著降低了测试能力。在本文中,我们提出了基于通路集成回归的核关联测试(PaIRKAT),这是一种新的核机器回归方法,用于将已知的通路信息纳入半参数核回归框架中。这项工作扩展了以前的核机器方法。本文还贡献了一种用于克服不连通通路的图核正则化方法的应用。通过将正则化或“平滑”图纳入得分检验中,PaIRKAT 可以为生物途径与感兴趣的表型之间的关联提供更强大的检验,并有助于识别针对靶向临床研究的新途径。我们通过几项模拟研究和对 COPDGene 研究中真实代谢组学数据的应用来评估这种方法。我们的模拟研究说明了这种方法对不正确和不完整的通路知识的稳健性,并且真实数据分析表明在通路中测试能力有了有意义的提高。PaIRKAT 是为应用于代谢组学途径数据而开发的,但该技术很容易推广到具有类似图结构的其他数据源。