Suppr超能文献

对遗传分析研讨会17的非相关样本进行套索回归方法评估。

Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17.

作者信息

Guo Wei, Elston Robert C, Zhu Xiaofeng

机构信息

Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, USA.

出版信息

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S12. doi: 10.1186/1753-6561-5-S9-S12.

Abstract

The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.

摘要

我们使用的遗传分析研讨会17的数据包含697名无亲缘关系的个体,这些个体通过对3205个由千人基因组计划注释的基因进行小外显子组扫描,在24487个单核苷酸多态性(SNP)位点进行了基因分型,并使用了模拟表型。我们研究了性状Q2的200组模拟表型。该数据集的一个重要特征是大多数SNP是罕见的,87%的SNP的次要等位基因频率小于0.05。对于罕见SNP检测,在本研究中我们在基因水平上进行了最小绝对收缩和选择算子(LASSO)回归以及F检验,并计算了广义自由度以避免任何选择偏差。为了进行比较,我们还进行了线性回归和合并法,合并法是将罕见SNP进行求和,针对数量性状进行了修改,并设置了两个不同的等位基因频率阈值。本文的目的是在这个小外显子组数据中评估这四种方法,并比较它们在检验效能和假阳性率方面的表现。在大多数情况下,LASSO方法比线性回归和合并法更具检验效能。我们还注意到确定合并法的最佳阈值存在困难,以及连锁不平衡在检测罕见因果SNP中所起的重要作用。如果一个罕见因果SNP与同一基因中的一个常见标记处于强连锁不平衡状态,检验效能将大大提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a04b/3287844/b44b9ec83705/1753-6561-5-S9-S12-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验