Suppr超能文献

用于检测全基因组关联研究中基因-基因相互作用的零分布选择。

The choice of null distributions for detecting gene-gene interactions in genome-wide association studies.

机构信息

Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S26. doi: 10.1186/1471-2105-12-S1-S26.

Abstract

BACKGROUND

In genome-wide association studies (GWAS), the number of single-nucleotide polymorphisms (SNPs) typically ranges between 500,000 and 1,000,000. Accordingly, detecting gene-gene interactions in GWAS is computationally challenging because it involves hundreds of billions of SNP pairs. Stage-wise strategies are often used to overcome the computational difficulty. In the first stage, fast screening methods (e.g. Tuning ReliefF) are applied to reduce the whole SNP set to a small subset. In the second stage, sophisticated modeling methods (e.g., multifactor-dimensionality reduction (MDR)) are applied to the subset of SNPs to identify interesting interaction models and the corresponding interaction patterns. In the third stage, the significance of the identified interaction patterns is evaluated by hypothesis testing.

RESULTS

In this paper, we show that this stage-wise strategy could be problematic in controlling the false positive rate if the null distribution is not appropriately chosen. This is because screening and modeling may change the null distribution used in hypothesis testing. In our simulation study, we use some popular screening methods and the popular modeling method MDR as examples to show the effect of the inappropriate choice of null distributions. To choose appropriate null distributions, we suggest to use the permutation test or testing on the independent data set. We demonstrate their performance using synthetic data and a real genome wide data set from an Aged-related Macular Degeneration (AMD) study.

CONCLUSIONS

The permutation test or testing on the independent data set can help choosing appropriate null distributions in hypothesis testing, which provides more reliable results in practice.

摘要

背景

在全基因组关联研究(GWAS)中,单核苷酸多态性(SNP)的数量通常在 50 万到 100 万之间。因此,GWAS 中检测基因-基因相互作用在计算上具有挑战性,因为它涉及到数万亿个 SNP 对。分阶段策略通常用于克服计算困难。在第一阶段,快速筛选方法(例如 Tuning ReliefF)被应用于将整个 SNP 集缩小到一个小的子集。在第二阶段,复杂的建模方法(例如多因素降维(MDR))被应用于 SNP 子集,以识别有趣的相互作用模型和相应的相互作用模式。在第三阶段,通过假设检验评估所识别的相互作用模式的显著性。

结果

在本文中,我们表明,如果未适当选择零假设分布,这种分阶段策略可能会在控制假阳性率方面存在问题。这是因为筛选和建模可能会改变假设检验中使用的零假设分布。在我们的模拟研究中,我们使用一些流行的筛选方法和流行的建模方法 MDR 作为示例,展示了零假设分布选择不当的影响。为了选择适当的零假设分布,我们建议使用置换检验或独立数据集检验。我们使用合成数据和来自年龄相关性黄斑变性(AMD)研究的真实全基因组数据集演示了它们的性能。

结论

置换检验或独立数据集检验可以帮助在假设检验中选择适当的零假设分布,从而在实践中提供更可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45a2/3044281/cacc94959b62/1471-2105-12-S1-S26-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验