Suppr超能文献

通过自适应分组 Lasso 从大规模 SNP 数据中识别主要效应和上位性相互作用。

Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso.

机构信息

Laboratory for Bioinformatics and Computational Biology, Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, PR China.

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S18. doi: 10.1186/1471-2105-11-S1-S18.

Abstract

BACKGROUND

Single nucleotide polymorphism (SNP) based association studies aim at identifying SNPs associated with phenotypes, for example, complex diseases. The associated SNPs may influence the disease risk individually (main effects) or behave jointly (epistatic interactions). For the analysis of high throughput data, the main difficulty is that the number of SNPs far exceeds the number of samples. This difficulty is amplified when identifying interactions.

RESULTS

In this paper, we propose an Adaptive Group Lasso (AGL) model for large-scale association studies. Our model enables us to analyze SNPs and their interactions simultaneously. We achieve this by introducing a sparsity constraint in our model based on the fact that only a small fraction of SNPs is disease-associated. In order to reduce the number of false positive findings, we develop an adaptive reweighting scheme to enhance sparsity. In addition, our method treats SNPs and their interactions as factors, and identifies them in a grouped manner. Thus, it is flexible to analyze various disease models, especially for interaction detection. However, due to the intensive computation when millions of interaction terms needs to be searched in the model fitting, our method needs to combined with some filtering methods when applied to genome-wide data for detecting interactions.

CONCLUSION

By using a wide range of simulated datasets and a real dataset from WTCCC, we demonstrate the advantages of our method.

摘要

背景

基于单核苷酸多态性(SNP)的关联研究旨在识别与表型相关的 SNP,例如复杂疾病。相关 SNP 可能单独影响疾病风险(主要效应)或共同作用(上位相互作用)。对于高通量数据的分析,主要的困难是 SNP 的数量远远超过样本的数量。当识别相互作用时,这个困难会被放大。

结果

在本文中,我们提出了一种用于大规模关联研究的自适应组套索(AGL)模型。我们的模型使我们能够同时分析 SNP 及其相互作用。我们通过在模型中引入基于以下事实的稀疏性约束来实现这一点:只有一小部分 SNP 与疾病相关。为了减少假阳性发现的数量,我们开发了一种自适应重新加权方案来增强稀疏性。此外,我们的方法将 SNP 和它们的相互作用视为因子,并以分组的方式识别它们。因此,它灵活适用于分析各种疾病模型,特别是用于相互作用检测。然而,由于在模型拟合中需要搜索数以百万计的相互作用项时的密集计算,我们的方法在应用于全基因组数据以检测相互作用时需要与一些过滤方法结合使用。

结论

通过使用广泛的模拟数据集和来自 WTCCC 的真实数据集,我们展示了我们方法的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8c4/3203332/a6cf71017ef3/1471-2105-11-S1-S18-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验