Yazdani Akram, Yazdani Azam, Boerwinkle Eric
Human Genetics Center, University of Texas Health Science Center at Houston, TX, USA.
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
BMC Bioinformatics. 2015 Dec 4;16:405. doi: 10.1186/s12859-015-0825-4.
Availability of affordable and accessible whole genome sequencing for biomedical applications poses a number of statistical challenges and opportunities, particularly related to the analysis of rare variants and sparseness of the data. Although efforts have been devoted to address these challenges, the performance of statistical methods for rare variants analysis still needs further consideration.
We introduce a new approach that applies restricted principal component analysis with convex penalization and then selects the best predictors of a phenotype by a concave penalized regression model, while estimating the impact of each genomic region on the phenotype. Using simulated data, we show that the proposed method maintains good power for association testing while keeping the false discovery rate low under a verity of genetic architectures. Illustrative data analyses reveal encouraging result of this method in comparison with other commonly applied methods for rare variants analysis.
By taking into account linkage disequilibrium and sparseness of the data, the proposed method improves power and controls the false discovery rate compared to other commonly applied methods for rare variant analyses.
可用于生物医学应用的经济实惠且易于获取的全基因组测序带来了诸多统计挑战和机遇,特别是在罕见变异分析和数据稀疏性方面。尽管已经致力于应对这些挑战,但用于罕见变异分析的统计方法的性能仍需进一步考量。
我们引入了一种新方法,该方法应用带凸惩罚的受限主成分分析,然后通过凹惩罚回归模型选择表型的最佳预测因子,同时估计每个基因组区域对表型的影响。使用模拟数据,我们表明所提出的方法在关联测试中保持了良好的功效,同时在多种遗传结构下保持较低的错误发现率。实例数据分析显示,与其他常用的罕见变异分析方法相比,该方法取得了令人鼓舞的结果。
通过考虑连锁不平衡和数据稀疏性,与其他常用的罕见变异分析方法相比,所提出的方法提高了功效并控制了错误发现率。