Suppr超能文献

2型糖尿病病例对照研究中高维单核苷酸多态性数据的基于网络的正则化方法

Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes.

作者信息

Ren Jie, He Tao, Li Ye, Liu Sai, Du Yinhao, Jiang Yu, Wu Cen

机构信息

Department of Statistics, Kansas State University, 1116 Mid-Campus Drive N., 66506, Manhattan, KS, USA.

Department of Mathematics, San Francisco State University, San Francisco, CA, USA.

出版信息

BMC Genet. 2017 May 16;18(1):44. doi: 10.1186/s12863-017-0495-5.

Abstract

BACKGROUND

Over the past decades, the prevalence of type 2 diabetes mellitus (T2D) has been steadily increasing around the world. Despite large efforts devoted to better understand the genetic basis of the disease, the identified susceptibility loci can only account for a small portion of the T2D heritability. Some of the existing approaches proposed for the high dimensional genetic data from the T2D case-control study are limited by analyzing a few number of SNPs at a time from a large pool of SNPs, by ignoring the correlations among SNPs and by adopting inefficient selection techniques.

METHODS

We propose a network constrained regularization method to select important SNPs by taking the linkage disequilibrium into account. To accomodate the case control study, an iteratively reweighted least square algorithm has been developed within the coordinate descent framework where optimization of the regularized logistic loss function is performed with respect to one parameter at a time and iteratively cycle through all the parameters until convergence.

RESULTS

In this article, a novel approach is developed to identify important SNPs more effectively through incorporating the interconnections among them in the regularized selection. A coordinate descent based iteratively reweighed least squares (IRLS) algorithm has been proposed.

CONCLUSIONS

Both the simulation study and the analysis of the Nurses's Health Study, a case-control study of type 2 diabetes data with high dimensional SNP measurements, demonstrate the advantage of the network based approach over the competing alternatives.

摘要

背景

在过去几十年中,2型糖尿病(T2D)在全球的患病率一直在稳步上升。尽管人们付出了巨大努力来更好地理解该疾病的遗传基础,但已确定的易感基因座仅占T2D遗传力的一小部分。一些针对T2D病例对照研究的高维遗传数据提出的现有方法存在局限性,例如一次仅从大量单核苷酸多态性(SNP)中分析少数几个SNP、忽略SNP之间的相关性以及采用低效的选择技术。

方法

我们提出一种网络约束正则化方法,通过考虑连锁不平衡来选择重要的SNP。为适应病例对照研究,在坐标下降框架内开发了一种迭代加权最小二乘算法,其中正则化逻辑损失函数的优化是一次针对一个参数进行,并对所有参数进行迭代循环直至收敛。

结果

在本文中,开发了一种新颖的方法,通过在正则化选择中纳入SNP之间的相互联系来更有效地识别重要的SNP。提出了一种基于坐标下降的迭代加权最小二乘(IRLS)算法。

结论

模拟研究以及对护士健康研究(一项具有高维SNP测量的2型糖尿病数据的病例对照研究)的分析均表明,基于网络的方法优于其他竞争方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe6a/5434559/e04e572c4f82/12863_2017_495_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验