一种用于病例对照测序研究中次要性状罕见变异关联分析的稳健且强大的集值方法。

A Robust and Powerful Set-Valued Approach to Rare Variant Association Analyses of Secondary Traits in Case-Control Sequencing Studies.

作者信息

Kang Guolian, Bi Wenjian, Zhang Hang, Pounds Stanley, Cheng Cheng, Shete Sanjay, Zou Fei, Zhao Yanlong, Zhang Ji-Feng, Yue Weihua

机构信息

Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105

Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105.

出版信息

Genetics. 2017 Mar;205(3):1049-1062. doi: 10.1534/genetics.116.192377. Epub 2016 Dec 30.

DOI:10.1534/genetics.116.192377

PMID:28040743

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5340322/

Abstract

In many case-control designs of genome-wide association (GWAS) or next generation sequencing (NGS) studies, extensive data on secondary traits that may correlate and share the common genetic variants with the primary disease are available. Investigating these secondary traits can provide critical insights into the disease etiology or pathology, and enhance the GWAS or NGS results. Methods based on logistic regression (LG) were developed for this purpose. However, for the identification of rare variants (RVs), certain inadequacies in the LG models and algorithmic instability can cause severely inflated type I error, and significant loss of power, when the two traits are correlated and the RV is associated with the disease, especially at stringent significance levels. To address this issue, we propose a novel set-valued (SV) method that models a binary trait by dichotomization of an underlying continuous variable, and incorporate this into the genetic association model as a critical component. Extensive simulations and an analysis of seven secondary traits in a GWAS of benign ethnic neutropenia show that the SV method consistently controls type I error well at stringent significance levels, has larger power than the LG-based methods, and is robust in performance to effect pattern of the genetic variant (risk or protective), rare or common variants, rare or common diseases, and trait distributions. Because of the SV method's striking and profound advantage, we strongly recommend the SV method be employed instead of the LG-based methods for secondary traits analyses in case-control sequencing studies.

摘要

在许多全基因组关联研究（GWAS）或下一代测序（NGS）研究的病例对照设计中，可获得大量关于可能与原发性疾病相关并共享常见遗传变异的次要性状的数据。研究这些次要性状可以为疾病病因或病理提供关键见解，并增强GWAS或NGS的结果。为此开发了基于逻辑回归（LG）的方法。然而，对于罕见变异（RV）的识别，当两个性状相关且RV与疾病相关时，尤其是在严格的显著性水平下，LG模型中的某些不足和算法不稳定性可能会导致I型错误严重膨胀，以及显著的效能损失。为了解决这个问题，我们提出了一种新颖的集值（SV）方法，该方法通过对潜在连续变量进行二分来对二元性状进行建模，并将其作为关键组成部分纳入遗传关联模型。广泛的模拟以及对良性种族性中性粒细胞减少症GWAS中七个次要性状的分析表明，SV方法在严格的显著性水平下始终能很好地控制I型错误，比基于LG的方法具有更大的效能，并且在性能上对遗传变异的效应模式（风险或保护）、罕见或常见变异、罕见或常见疾病以及性状分布具有稳健性。由于SV方法具有显著而深刻的优势，我们强烈建议在病例对照测序研究中，采用SV方法而非基于LG的方法进行次要性状分析。

相似文献

A Robust and Powerful Set-Valued Approach to Rare Variant Association Analyses of Secondary Traits in Case-Control Sequencing Studies.一种用于病例对照测序研究中次要性状罕见变异关联分析的稳健且强大的集值方法。

Genetics. 2017 Mar;205(3):1049-1062. doi: 10.1534/genetics.116.192377. Epub 2016 Dec 30.

SVSI: fast and powerful set-valued system identification approach to identifying rare variants in sequencing studies for ordered categorical traits.SVSI：用于在有序分类性状的测序研究中识别罕见变异的快速且强大的集值系统识别方法。

Ann Hum Genet. 2015 Jul;79(4):294-309. doi: 10.1111/ahg.12117. Epub 2015 May 11.

On Robust Association Testing for Quantitative Traits and Rare Variants.关于数量性状和罕见变异的稳健关联测试

G3 (Bethesda). 2016 Dec 7;6(12):3941-3950. doi: 10.1534/g3.116.035485.

A new system identification approach to identify genetic variants in sequencing studies for a binary phenotype.一种用于在二元表型测序研究中识别基因变异的新系统识别方法。

Hum Hered. 2014;78(2):104-16. doi: 10.1159/000363660. Epub 2014 Jul 30.

JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression.JASPER：在结构样本中快速、强大、多特征关联测试提供了对基因表达中的多效性的深入了解。

Am J Hum Genet. 2024 Aug 8;111(8):1750-1769. doi: 10.1016/j.ajhg.2024.06.010. Epub 2024 Jul 17.

A rare-variant test for high-dimensional data.一种针对高维数据的罕见变异检测方法。

Eur J Hum Genet. 2017 Aug;25(8):988-994. doi: 10.1038/ejhg.2017.90. Epub 2017 May 24.

A unified method for detecting secondary trait associations with rare variants: application to sequence data.一种用于检测罕见变异与二级性状关联的统一方法：在序列数据中的应用。

PLoS Genet. 2012;8(11):e1003075. doi: 10.1371/journal.pgen.1003075. Epub 2012 Nov 15.

Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.利用 GWAS 汇总数据和自适应检验方法整合多种性状，以检测新的性状-基因关联。

Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.

Leveraging family history in genetic association analyses of binary traits.利用家族史进行二元性状的遗传关联分析。

BMC Genomics. 2022 Oct 1;23(1):678. doi: 10.1186/s12864-022-08897-8.

A novel association test for multiple secondary phenotypes from a case-control GWAS.一种针对病例对照全基因组关联研究中多个次要表型的新型关联测试。

Genet Epidemiol. 2017 Jul;41(5):413-426. doi: 10.1002/gepi.22045. Epub 2017 Apr 10.

引用本文的文献

STEPS: an efficient prospective likelihood approach to genetic association analyses of secondary traits in extreme phenotype sequencing.步骤：一种高效的前瞻性似然方法，用于极端表型测序中次要性状的遗传关联分析。

Biostatistics. 2020 Jan 1;21(1):33-49. doi: 10.1093/biostatistics/kxy030.

本文引用的文献

A General and Robust Framework for Secondary Traits Analysis.一种用于次要性状分析的通用且稳健的框架。

Genetics. 2016 Apr;202(4):1329-43. doi: 10.1534/genetics.115.181073. Epub 2016 Feb 19.

Ann Hum Genet. 2015 Jul;79(4):294-309. doi: 10.1111/ahg.12117. Epub 2015 May 11.

Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.外显子组测序鉴定出赋予心肌梗死风险的罕见低密度脂蛋白受体（LDLR）和载脂蛋白A5（APOA5）等位基因。

Nature. 2015 Feb 5;518(7537):102-6. doi: 10.1038/nature13917. Epub 2014 Dec 10.

A new system identification approach to identify genetic variants in sequencing studies for a binary phenotype.一种用于在二元表型测序研究中识别基因变异的新系统识别方法。

Hum Hered. 2014;78(2):104-16. doi: 10.1159/000363660. Epub 2014 Jul 30.

Strategies to design and analyze targeted sequencing data: cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Targeted Sequencing Study.设计和分析靶向测序数据的策略：基因组流行病学心脏与衰老研究队列（CHARGE）联盟靶向测序研究

Circ Cardiovasc Genet. 2014 Jun;7(3):335-43. doi: 10.1161/CIRCGENETICS.113.000350.

Robust estimation for secondary trait association in case-control genetic studies.病例对照基因研究中次要性状关联的稳健估计

Am J Epidemiol. 2014 May 15;179(10):1264-72. doi: 10.1093/aje/kwu039. Epub 2014 Apr 9.

Unified Analysis of Secondary Traits in Case-Control Association Studies.病例对照关联研究中次要性状的统一分析

J Am Stat Assoc. 2013;108(502). doi: 10.1080/01621459.2013.793121.

Quantitative trait analysis in sequencing studies under trait-dependent sampling.基于性状依赖抽样的测序研究中的数量性状分析。

Proc Natl Acad Sci U S A. 2013 Jul 23;110(30):12247-52. doi: 10.1073/pnas.1221713110. Epub 2013 Jul 11.

Apolipoprotein E and Alzheimer disease: risk, mechanisms and therapy.载脂蛋白 E 与阿尔茨海默病：风险、机制与治疗。

Nat Rev Neurol. 2013 Feb;9(2):106-18. doi: 10.1038/nrneurol.2012.263. Epub 2013 Jan 8.

Genome-wide association and population genetic analysis of C-reactive protein in African American and Hispanic American women.非裔美国女性和西班牙裔美国女性 C 反应蛋白的全基因组关联和群体遗传学分析。

Am J Hum Genet. 2012 Sep 7;91(3):502-12. doi: 10.1016/j.ajhg.2012.07.023. Epub 2012 Aug 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验