Suppr超能文献

大规模测序研究中单个基因座水平的罕见变异关联分析

Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level.

作者信息

Jeng Xinge Jessie, Daye Zhongyin John, Lu Wenbin, Tzeng Jung-Ying

机构信息

Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.

Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona, United States of America.

出版信息

PLoS Comput Biol. 2016 Jun 29;12(6):e1004993. doi: 10.1371/journal.pcbi.1004993. eCollection 2016 Jun.

Abstract

Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.

摘要

在下一代测序(NGS)研究中,由于存在大量频率极低的候选变异,对罕见变异进行基因关联分析极具挑战性。近期的研究进展通常聚焦于合并多个变异,以便在基因而非位点水平上进行关联分析。尽管如此,精确识别单个变异仍是基因组研究的关键目标,因为此类信息有助于精准描述遗传因素在疾病中的分子机制和功能。由于突变极其罕见且维度高,因果变异的显著性不易从非因果变异中凸显出来。因此,标准的假阳性控制程序,如邦费罗尼校正和错误发现率(FDR),往往不切实际,因为大多数因果变异只能与少数但数量未知的非因果变异一同被识别。为了在大规模测序研究中对单个变异进行有意义的分析,我们提出了自适应假阴性控制(AFNC)程序,该程序通过引入一种新颖的统计探究来确定那些可被确定为非因果的变异,从而能够以高置信度纳入大部分因果变异。AFNC提供了一个通用框架,可适用于各种模型和显著性检验。该程序计算效率高,能够适应因果变异的潜在比例和显著性排名的质量。在众多场景下进行的广泛模拟研究表明,AFNC在识别单个罕见变异方面具有优势,而邦费罗尼校正和FDR在罕见变异关联研究中则过于保守。在对CoLaus数据集的分析中,AFNC识别出了对基因水平显著性最具责任的单个变异。此外,使用AFNC的单变异结果已成功应用于利用注释信息推断相关基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11c4/4927097/80e08dbcb4a1/pcbi.1004993.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验