Suppr超能文献

用于过滤全基因组关联研究中 SNPs 的质量控制算法。

A quality control algorithm for filtering SNPs in genome-wide association studies.

机构信息

Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA.

出版信息

Bioinformatics. 2010 Jul 15;26(14):1731-7. doi: 10.1093/bioinformatics/btq272. Epub 2010 May 25.

Abstract

MOTIVATION

The quality control (QC) filtering of single nucleotide polymorphisms (SNPs) is an important step in genome-wide association studies to minimize potential false findings. SNP QC commonly uses expert-guided filters based on QC variables [e.g. Hardy-Weinberg equilibrium, missing proportion (MSP) and minor allele frequency (MAF)] to remove SNPs with insufficient genotyping quality. The rationale of the expert filters is sensible and concrete, but its implementation requires arbitrary thresholds and does not jointly consider all QC features.

RESULTS

We propose an algorithm that is based on principal component analysis and clustering analysis to identify low-quality SNPs. The method minimizes the use of arbitrary cutoff values, allows a collective consideration of the QC features and provides conditional thresholds contingent on other QC variables (e.g. different MSP thresholds for different MAFs). We apply our method to the seven studies from the Wellcome Trust Case Control Consortium and the major depressive disorder study from the Genetic Association Information Network. We measured the performance of our method compared to the expert filters based on the following criteria: (i) percentage of SNPs excluded due to low quality; (ii) inflation factor of the test statistics (lambda); (iii) number of false associations found in the filtered dataset; and (iv) number of true associations missed in the filtered dataset. The results suggest that with the same or fewer SNPs excluded, the proposed algorithm tends to give a similar or lower value of lambda, a reduced number of false associations, and retains all true associations.

AVAILABILITY

The algorithm is available at http://www4.stat.ncsu.edu/jytzeng/software.php

摘要

动机

质量控制 (QC) 过滤单核苷酸多态性 (SNP) 是全基因组关联研究中的一个重要步骤,可最大限度地减少潜在的错误发现。SNP QC 通常使用基于 QC 变量的专家指导过滤器(例如 Hardy-Weinberg 平衡、缺失比例 (MSP) 和次要等位基因频率 (MAF))来去除基因型质量不足的 SNP。专家过滤器的原理是合理且具体的,但它的实施需要任意的阈值,并且不能共同考虑所有 QC 特征。

结果

我们提出了一种基于主成分分析和聚类分析的算法来识别低质量 SNP。该方法最大限度地减少了任意截止值的使用,允许集体考虑 QC 特征,并根据其他 QC 变量(例如,不同 MAF 的不同 MSP 阈值)提供条件阈值。我们将我们的方法应用于来自 Wellcome Trust 病例对照联盟的七项研究和来自遗传关联信息网络的重度抑郁症研究。我们根据以下标准衡量我们的方法与专家过滤器的性能:(i) 由于质量低而排除的 SNP 百分比;(ii) 检验统计量的膨胀因子 (lambda);(iii) 在过滤数据集发现的虚假关联数量;和 (iv) 在过滤数据集错过的真实关联数量。结果表明,使用相同或更少的 SNP 排除,所提出的算法往往会给出相似或更低的 lambda 值、更少的虚假关联,并保留所有真实关联。

可用性

该算法可在 http://www4.stat.ncsu.edu/jytzeng/software.php 获得。

相似文献

2
Quality control for genome-wide association studies.全基因组关联研究的质量控制
Methods Mol Biol. 2010;628:341-72. doi: 10.1007/978-1-60327-367-1_19.

引用本文的文献

3

本文引用的文献

6
Appropriate data cleaning methods for genome-wide association study.全基因组关联研究的适当数据清理方法。
J Hum Genet. 2008;53(10):886-893. doi: 10.1007/s10038-008-0322-y. Epub 2008 Aug 12.
9
The positives, protocols, and perils of genome-wide association.全基因组关联研究的优势、方案与风险
Am J Med Genet B Neuropsychiatr Genet. 2008 Oct 5;147B(7):1288-94. doi: 10.1002/ajmg.b.30747.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验