基于基因组距离的回归与核机器回归在多标记关联测试中的关系。

Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing.

作者信息

Pan Wei

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455–0392, USA.

出版信息

Genet Epidemiol. 2011 May;35(4):211-6. doi: 10.1002/gepi.20567.

DOI:10.1002/gepi.20567

PMID:21308765

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3134543/

Abstract

To detect genetic association with common and complex diseases, two powerful yet quite different multimarker association tests have been proposed, genomic distance-based regression (GDBR) (Wessel and Schork [2006] Am J Hum Genet 79:821–833) and kernel machine regression (KMR) (Kwee et al. [2008] Am J Hum Genet 82:386–397; Wu et al. [2010] Am J Hum Genet 86:929–942). GDBR is based on relating a multimarker similarity metric for a group of subjects to variation in their trait values, while KMR is based on nonparametric estimates of the effects of the multiple markers on the trait through a kernel function or kernel matrix. Since the two approaches are both powerful and general, but appear quite different, it is important to know their specific relationships. In this report, we show that, under the condition that there is no other covariate, there is a striking correspondence between the two approaches for a quantitative or a binary trait: if the same positive semi-definite matrix is used as the centered similarity matrix in GDBR and as the kernel matrix in KMR, the F-test statistic in GDBR and the score test statistic in KMR are equal (up to some ignorable constants). The result is based on the connections of both methods to linear or logistic (random-effects) regression models.

摘要

为了检测与常见复杂疾病的基因关联，人们提出了两种强大但截然不同的多标记关联测试方法，即基于基因组距离的回归（GDBR）（韦塞尔和肖尔克[2006]《美国人类遗传学杂志》79:821 - 833）和核机器回归（KMR）（奎伊等人[2008]《美国人类遗传学杂志》82:386 - 397；吴等人[2010]《美国人类遗传学杂志》86:929 - 942）。GDBR基于将一组受试者的多标记相似性度量与他们的性状值变化相关联，而KMR基于通过核函数或核矩阵对多个标记对性状的影响进行非参数估计。由于这两种方法都很强大且通用，但看起来差异很大，了解它们的具体关系很重要。在本报告中，我们表明，在没有其他协变量的情况下，对于定量或二元性状，这两种方法之间存在显著的对应关系：如果在GDBR中使用相同的正定矩阵作为中心化相似性矩阵，在KMR中作为核矩阵，那么GDBR中的F检验统计量和KMR中的得分检验统计量是相等的（忽略一些可忽略的常数）。该结果基于这两种方法与线性或逻辑（随机效应）回归模型的联系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b5/3134543/45857c6789b5/nihms-281270-f0001.jpg

相似文献

Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing.基于基因组距离的回归与核机器回归在多标记关联测试中的关系。

Genet Epidemiol. 2011 May;35(4):211-6. doi: 10.1002/gepi.20567.

Association tests using kernel-based measures of multi-locus genotype similarity between individuals.基于核函数的个体间多基因座基因型相似性的关联测试。

Genet Epidemiol. 2010 Apr;34(3):213-21. doi: 10.1002/gepi.20451.

Equivalence of kernel machine regression and kernel distance covariance for multidimensional phenotype association studies.用于多维表型关联研究的核机器回归与核距离协方差的等价性

Biometrics. 2015 Sep;71(3):812-20. doi: 10.1111/biom.12314. Epub 2015 May 1.

Powerful multi-marker association tests: unifying genomic distance-based regression and logistic regression.强大的多标记关联测试：基于基因组距离的回归和逻辑回归的统一。

Genet Epidemiol. 2010 Nov;34(7):680-8. doi: 10.1002/gepi.20529.

Multivariate phenotype association analysis by marker-set kernel machine regression.基于标记集核机器回归的多变量表型关联分析。

Genet Epidemiol. 2012 Nov;36(7):686-95. doi: 10.1002/gepi.21663. Epub 2012 Aug 16.

General Kernel Machine Methods for Multi-Omics Integration and Genome-Wide Association Testing With Related Individuals.用于多组学整合及相关个体全基因组关联测试的通用核机器方法

Genet Epidemiol. 2025 Jan;49(1):e22610. doi: 10.1002/gepi.22610.

Functional linear models for association analysis of quantitative traits.功能线性模型在数量性状关联分析中的应用。

Genet Epidemiol. 2013 Nov;37(7):726-42. doi: 10.1002/gepi.21757.

Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits.复杂数量性状遗传关联研究中交互作用效应建模的核方法

Genet Epidemiol. 2015 Jul;39(5):366-75. doi: 10.1002/gepi.21901. Epub 2015 Apr 17.

Power comparisons between similarity-based multilocus association methods, logistic regression, and score tests for haplotypes.基于相似性的多位点关联方法、逻辑回归和单倍型得分检验之间的功效比较。

Genet Epidemiol. 2009 Apr;33(3):183-97. doi: 10.1002/gepi.20364.

Kalpra: A kernel approach for longitudinal pathway regression analysis integrating network information with an application to the longitudinal PsyCourse Study.卡尔普拉：一种将网络信息整合到纵向路径回归分析中的核方法及其在纵向心理课程研究中的应用

Front Genet. 2022 Dec 6;13:1015885. doi: 10.3389/fgene.2022.1015885. eCollection 2022.

引用本文的文献

Phylogenetic association analysis with conditional rank correlation.基于条件秩相关的系统发育关联分析。

Biometrika. 2023 Dec 1;111(3):881-902. doi: 10.1093/biomet/asad075. eCollection 2024 Sep.

Biostatistics. 2024 Oct 1;25(4):1122-1139. doi: 10.1093/biostatistics/kxad033.

DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.DeLIVR：一种用于转录组全关联研究中测试非线性因果效应的IV回归深度学习方法。

Biostatistics. 2024 Apr 15;25(2):468-485. doi: 10.1093/biostatistics/kxac051.

A gene based combination test using GWAS summary data.基于 GWAS 汇总数据的基因组合测试。

BMC Bioinformatics. 2023 Jan 3;24(1):2. doi: 10.1186/s12859-022-05114-x.

Associating somatic mutation with clinical outcomes through kernel regression and optimal transport.通过核回归和最优传输将体细胞突变与临床结果关联起来。

Biometrics. 2023 Sep;79(3):2705-2718. doi: 10.1111/biom.13769. Epub 2022 Oct 17.

Compositional Data Analysis using Kernels in mass cytometry data.在质谱流式细胞术数据中使用核函数进行成分数据分析。

Bioinform Adv. 2022 Feb 11;2(1):vbac003. doi: 10.1093/bioadv/vbac003. eCollection 2022.

IDEAS: individual level differential expression analysis for single-cell RNA-seq data.IDEAS：单细胞 RNA-seq 数据的个体水平差异表达分析。

Genome Biol. 2022 Jan 24;23(1):33. doi: 10.1186/s13059-022-02605-1.

A Statistical Method for Association Analysis of Cell Type Compositions.一种用于细胞类型组成关联分析的统计方法。

Stat Biosci. 2021 Dec;13(3):373-385. doi: 10.1007/s12561-020-09293-0. Epub 2021 Sep 15.

A Multi-Marker Test for Analyzing Paired Genetic Data in Transplantation.一种用于分析移植中配对遗传数据的多标记测试。

Front Genet. 2021 Oct 13;12:745773. doi: 10.3389/fgene.2021.745773. eCollection 2021.

TS: a powerful truncated test to detect novel disease associated genes using publicly available gWAS summary data.TS：一种强大的截断测试，用于使用公开可用的 gWAS 汇总数据检测新的疾病相关基因。

BMC Bioinformatics. 2020 May 4;21(1):172. doi: 10.1186/s12859-020-3511-0.

本文引用的文献

Powerful multi-marker association tests: unifying genomic distance-based regression and logistic regression.强大的多标记关联测试：基于基因组距离的回归和逻辑回归的统一。

Genet Epidemiol. 2010 Nov;34(7):680-8. doi: 10.1002/gepi.20529.

Hum Hered. 2010;70(2):109-31. doi: 10.1159/000312641. Epub 2010 Jul 3.

Hum Hered. 2010;70(2):132-40. doi: 10.1159/000312643. Epub 2010 Jul 3.

Powerful SNP-set analysis for case-control genome-wide association studies.基于全基因组关联研究的病例对照 SNP 集分析。

Am J Hum Genet. 2010 Jun 11;86(6):929-42. doi: 10.1016/j.ajhg.2010.05.002.

Association tests using kernel-based measures of multi-locus genotype similarity between individuals.基于核函数的个体间多基因座基因型相似性的关联测试。

Genet Epidemiol. 2010 Apr;34(3):213-21. doi: 10.1002/gepi.20451.

Biometrics. 2009 Sep;65(3):822-32. doi: 10.1111/j.1541-0420.2008.01176.x. Epub 2009 Feb 4.

Asymptotic tests of association with multiple SNPs in linkage disequilibrium.与处于连锁不平衡状态的多个单核苷酸多态性（SNP）相关联的渐近检验。

Genet Epidemiol. 2009 Sep;33(6):497-507. doi: 10.1002/gepi.20402.

Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment.使用基于距离的回归进行遗传背景比较及其在群体分层评估与调整中的应用。

Genet Epidemiol. 2009 Jul;33(5):432-41. doi: 10.1002/gepi.20396.

Genetic mapping in human disease.人类疾病中的基因定位

Science. 2008 Nov 7;322(5903):881-8. doi: 10.1126/science.1156409.

Personal genomes: The case of the missing heritability.个人基因组：“缺失的遗传力”问题

Nature. 2008 Nov 6;456(7218):18-21. doi: 10.1038/456018a.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验