• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

二元性状遗传关联研究的惩罚逻辑回归分析

Penalized Logistic Regression Analysis for Genetic Association Studies of Binary Phenotypes.

作者信息

Yu Ying, Chen Siyuan, Jones Samantha Jean, Hoque Rawnak, Vishnyakova Olga, Brooks-Wilson Angela, McNeney Brad

出版信息

Hum Hered. 2022 Jun 29. doi: 10.1159/000525650.

DOI:10.1159/000525650
PMID:35767963
Abstract

INTRODUCTION

Increasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare genetic variants. Sparseness leads to maximum likelihood estimators (MLEs) of log-OR parameters that are biased away from their null value of zero and tests with inflated type 1 errors. Different penalized-likelihood methods have been developed to mitigate sparse-data bias. We study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. For a given m, log-F-penalized logistic regression may be easily implemented using data augmentation and standard software.

METHOD

We propose a two-step approach to the analysis of a genetic association study: first, a set of variants that show evidence of association with the trait is used to estimate m; and second, the estimated m is used for log-F-penalized logistic regression analyses of all variants using data augmentation with standard software. Our estimate of m is the maximizer of a marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm (MCEM) and (ii) a Laplace approximation (LA) to each integral, followed by derivative-free optimization of the approximation.

RESULTS

We evaluate the statistical properties of our proposed two-step method and compared its performance to other shrinkage methods by a simulation study. Our simulation studies suggest that the proposed log-F-penalized approach has lower bias and mean squared error than other methods considered. We also illustrate the approach on data from a study of genetic associations with "super senior" cases and middle aged controls.

DISCUSSION/CONCLUSION: We have proposed a method for single rare variant analysis with binary phenotypes by logistic regression penalized by log-F priors. Our method has the advantage of being easily extended to correct for confounding due to population structure and genetic relatedness through a data augmentation approach.

摘要

引言

对于二元表型的基因关联研究,逻辑回归方法越来越需要能够处理数据稀疏问题,这种问题源于不平衡的病例对照比例和/或罕见的基因变异。数据稀疏会导致对数优势比(log-OR)参数的最大似然估计值(MLEs)偏离其零值,并且检验的第一类错误会膨胀。已经开发了不同的惩罚似然方法来减轻稀疏数据偏差。我们使用一类由收缩参数m索引的对数F先验来研究惩罚逻辑回归,以使有偏差的MLE向零收缩。对于给定的m,对数F惩罚逻辑回归可以使用数据增强和标准软件轻松实现。

方法

我们提出了一种用于基因关联研究分析的两步法:首先,使用一组显示与该性状存在关联证据的变异来估计m;其次,使用估计的m对所有变异进行对数F惩罚逻辑回归分析,使用标准软件通过数据增强来实现。我们对m的估计是通过从参数和观测数据的联合分布中积分出潜在的对数优势比而获得的边际似然的最大化者。我们考虑两种近似最大化边际似然的方法:(i)蒙特卡罗期望最大化算法(MCEM)和(ii)对每个积分的拉普拉斯近似(LA),然后对近似值进行无导数优化。

结果

我们评估了我们提出的两步法的统计特性,并通过模拟研究将其性能与其他收缩方法进行了比较。我们的模拟研究表明,所提出的对数F惩罚方法比其他考虑的方法具有更低的偏差和均方误差。我们还使用与“超级老年人”病例和中年对照的基因关联研究数据说明了该方法。

讨论/结论:我们提出了一种通过对数F先验惩罚的逻辑回归对二元表型进行单罕见变异分析的方法。我们的方法具有易于扩展的优点,可通过数据增强方法校正由于群体结构和基因相关性导致的混杂。

相似文献

1
Penalized Logistic Regression Analysis for Genetic Association Studies of Binary Phenotypes.二元性状遗传关联研究的惩罚逻辑回归分析
Hum Hered. 2022 Jun 29. doi: 10.1159/000525650.
2
Penalized maximum likelihood inference under the mixture cure model in sparse data.稀疏数据下混合治愈模型的惩罚极大似然推断。
Stat Med. 2023 Jun 15;42(13):2134-2161. doi: 10.1002/sim.9715. Epub 2023 Mar 25.
3
Improving logistic regression on the imbalanced data by a novel penalized log-likelihood function.通过一种新型惩罚对数似然函数改进不平衡数据上的逻辑回归。
J Appl Stat. 2021 Jun 16;49(13):3257-3277. doi: 10.1080/02664763.2021.1939662. eCollection 2022.
4
On estimation for accelerated failure time models with small or rare event survival data.小样本或稀有事件生存数据的加速失效时间模型估计。
BMC Med Res Methodol. 2022 Jun 11;22(1):169. doi: 10.1186/s12874-022-01638-1.
5
An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes.探讨惩罚和数据增强对改善聚类二项结局广义估计方程收敛性的作用。
BMC Med Res Methodol. 2022 Jun 9;22(1):168. doi: 10.1186/s12874-022-01641-6.
6
Laplace approximation, penalized quasi-likelihood, and adaptive Gauss-Hermite quadrature for generalized linear mixed models: towards meta-analysis of binary outcome with sparse data.拉普拉斯逼近、惩罚拟似然和广义线性混合模型的自适应高斯-埃尔米特求积:用于稀疏数据二分类结局的荟萃分析。
BMC Med Res Methodol. 2020 Jun 11;20(1):152. doi: 10.1186/s12874-020-01035-6.
7
Estimating haplotype effects on dichotomous outcome for unphased genotype data using a weighted penalized log-likelihood approach.使用加权惩罚对数似然法估计未分型基因型数据对二分结果的单倍型效应。
Hum Hered. 2006;61(2):104-10. doi: 10.1159/000093476. Epub 2006 May 24.
8
SNP selection in genome-wide and candidate gene studies via penalized logistic regression.通过惩罚逻辑回归进行全基因组和候选基因研究中的 SNP 选择。
Genet Epidemiol. 2010 Dec;34(8):879-91. doi: 10.1002/gepi.20543.
9
Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data.Firth 法和对数 F 型惩罚方法在小样本或稀疏二元数据风险预测中的性能
BMC Med Res Methodol. 2017 Feb 23;17(1):33. doi: 10.1186/s12874-017-0313-9.
10
A screening-testing approach for detecting gene-environment interactions using sequential penalized and unpenalized multiple logistic regression.一种使用序贯惩罚和非惩罚多元逻辑回归检测基因-环境相互作用的筛查-检测方法。
Pac Symp Biocomput. 2015:183-94.