Suppr超能文献

双变量全基因组关联研究生物标志物检测的稳定性

Stability of bivariate GWAS biomarker detection.

作者信息

Bedő Justin, Rawlinson David, Goudey Benjamin, Ong Cheng Soon

机构信息

NICTA Victoria Research Laboratory, University of Melbourne, Victoria, Australia; Department of Computing and Information Systems, University of Melbourne, Victoria, Australia.

NICTA Victoria Research Laboratory, University of Melbourne, Victoria, Australia; Department of Electrical & Electronic Engineering, University of Melbourne, Victoria, Australia.

出版信息

PLoS One. 2014 Apr 30;9(4):e93319. doi: 10.1371/journal.pone.0093319. eCollection 2014.

Abstract

Given the difficulty and effort required to confirm candidate causal SNPs detected in genome-wide association studies (GWAS), there is no practical way to definitively filter false positives. Recent advances in algorithmics and statistics have enabled repeated exhaustive search for bivariate features in a practical amount of time using standard computational resources, allowing us to use cross-validation to evaluate the stability. We performed 10 trials of 2-fold cross-validation of exhaustive bivariate analysis on seven Wellcome-Trust Case-Control Consortium GWAS datasets, comparing the traditional [Formula: see text] test for association, the high-performance GBOOST method and the recently proposed GSS statistic (Available at http://bioinformatics.research.nicta.com.au/software/gwis/). We use Spearman's correlation to measure the similarity between the folds of cross validation. To compare incomplete lists of ranks we propose an extension to Spearman's correlation. The extension allows us to consider a natural threshold for feature selection where the correlation is zero. This is the first reported cross-validation study of exhaustive bivariate GWAS feature selection. We found that stability between ranked lists from different cross-validation folds was higher for GSS in the majority of diseases. A thorough analysis of the correlation between SNP-frequency and univariate [Formula: see text] score demonstrated that the [Formula: see text] test for association is highly confounded by main effects: SNPs with high univariate significance replicably dominate the ranked results. We show that removal of the univariately significant SNPs improves [Formula: see text] replicability but risks filtering pairs involving SNPs with univariate effects. We empirically confirm that the stability of GSS and GBOOST were not affected by removal of univariately significant SNPs. These results suggest that the GSS and GBOOST tests are successfully targeting bivariate association with phenotype and that GSS is able to reliably detect a larger set of SNP-pairs than GBOOST in the majority of the data we analysed. However, the [Formula: see text] test for association was confounded by main effects.

摘要

鉴于在全基因组关联研究(GWAS)中确认候选因果单核苷酸多态性(SNP)所需的难度和工作量,目前没有切实可行的方法来明确过滤假阳性结果。算法和统计学方面的最新进展使得能够在标准计算资源下,在实际可用时间内对双变量特征进行反复穷举搜索,从而让我们能够使用交叉验证来评估稳定性。我们对七个威康信托病例对照协会GWAS数据集进行了2倍交叉验证的穷举双变量分析的10次试验,比较了传统的关联[公式:见原文]检验、高性能的GBOOST方法和最近提出的GSS统计量(可在http://bioinformatics.research.nicta.com.au/software/gwis/获取)。我们使用斯皮尔曼相关性来衡量交叉验证各折之间的相似性。为了比较不完整的排名列表,我们提出了斯皮尔曼相关性的扩展。该扩展使我们能够考虑特征选择的自然阈值,即相关性为零的情况。这是首次报道的关于穷举双变量GWAS特征选择的交叉验证研究。我们发现,在大多数疾病中,GSS在不同交叉验证折的排名列表之间的稳定性更高。对SNP频率与单变量[公式:见原文]得分之间相关性的深入分析表明,关联的[公式:见原文]检验受到主效应的高度混淆:具有高单变量显著性的SNP可重复性地主导排名结果。我们表明,去除单变量显著的SNP可提高[公式:见原文]的可重复性,但存在过滤涉及具有单变量效应SNP的配对的风险。我们通过实证证实,去除单变量显著的SNP不会影响GSS和GBOOST的稳定性。这些结果表明,GSS和GBOOST检验成功地针对了与表型的双变量关联,并且在我们分析的大多数数据中,GSS能够比GBOOST可靠地检测到更大的SNP对集合。然而,关联的[公式:见原文]检验受到主效应的混淆。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f345/4005767/ed1ad0a2ba69/pone.0093319.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验