Suppr超能文献

病例-队列研究分析中的变量选择两步法。

A two-step method for variable selection in the analysis of a case-cohort study.

机构信息

MRC Biostatistics Unit, Cambridge, UK.

MRC Epidemiology Unit, Cambridge, UK.

出版信息

Int J Epidemiol. 2018 Apr 1;47(2):597-604. doi: 10.1093/ije/dyx224.

Abstract

BACKGROUND

Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies.

METHODS

We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression.

RESULTS

Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods.

CONCLUSIONS

The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method.

摘要

背景

在病因分析中,准确检测和估计真实的暴露-结局关联非常重要;当存在多个潜在的感兴趣的暴露变量时,需要使用检测与感兴趣的结局有真实关联的变量子集的方法。病例-队列研究通常会收集大量未在整个队列中测量的变量的数据(例如生物标志物面板)。对于病例-队列研究中的变量选择方法,缺乏指导。

方法

我们描述并探讨了三种变量选择方法在病例-队列研究数据中的应用。这些方法是:(i)基于单变量(即逐一变量)Prentice 加权 Cox 回归模型中变量的显著性选择变量;(ii)应用于 Prentice 加权 Cox 回归的逐步选择;(iii)一种两步法,使用多变量逻辑回归获得每个变量的选择后验概率,然后使用 Prentice 加权 Cox 回归进行效应估计,从而应用贝叶斯变量选择算法。

结果

在九个不同的模拟场景中,两步法的敏感性高于逐一变量和逐步法,假发现率较低。在对 EPIC-InterAct 病例-队列研究数据的方法应用中,与逐一变量和逐步法相比,两步法确定了另外两种脂肪酸与 2 型糖尿病的发病有关。

结论

两步法能够更有力、更准确地检测病例-队列研究中的暴露-结局关联。提供了一个 R 包,使研究人员能够应用这种方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/398a/5913627/c3d3d6444683/dyx224f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验