Suppr超能文献

基于历史人口普查数据的不完全记录链接下的生存分析。

Survival analysis under imperfect record linkage using historic census data.

机构信息

Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Department of Family Medicine and Community Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

BMC Med Res Methodol. 2024 Mar 13;24(1):67. doi: 10.1186/s12874-024-02194-6.

Abstract

BACKGROUND

Advancements in linking publicly available census records with vital and administrative records have enabled novel investigations in epidemiology and social history. However, in the absence of unique identifiers, the linkage of the records may be uncertain or only be successful for a subset of the census cohort, resulting in missing data. For survival analysis, differential ascertainment of event times can impact inference on risk associations and median survival.

METHODS

We modify some existing approaches that are commonly used to handle missing survival times to accommodate this imperfect linkage situation including complete case analysis, censoring, weighting, and several multiple imputation methods. We then conduct simulation studies to compare the performance of the proposed approaches in estimating the associations of a risk factor or exposure in terms of hazard ratio (HR) and median survival times in the presence of missing survival times. The effects of different missing data mechanisms and exposure-survival associations on their performance are also explored. The approaches are applied to a historic cohort of residents in Ambler, PA, established using the 1930 US census, from which only 2,440 out of 4,514 individuals (54%) had death records retrievable from publicly available data sources and death certificates. Using this cohort, we examine the effects of occupational and paraoccupational asbestos exposure on survival and disparities in mortality by race and gender.

RESULTS

We show that imputation based on conditional survival results in less bias and greater efficiency relative to a complete case analysis when estimating log-hazard ratios and median survival times. When the approaches are applied to the Ambler cohort, we find a significant association between occupational exposure and mortality, particularly among black individuals and males, but not between paraoccupational exposure and mortality.

DISCUSSION

This investigation illustrates the strengths and weaknesses of different imputation methods for missing survival times due to imperfect linkage of the administrative or registry data. The performance of the methods may depend on the missingness process as well as the parameter being estimated and models of interest, and such factors should be considered when choosing the methods to address the missing event times.

摘要

背景

将公开的人口普查记录与生命和行政记录相联系的进展,使得在流行病学和社会历史方面进行新的研究成为可能。然而,在没有唯一标识符的情况下,记录的联系可能不确定,或者仅对人口普查队列的一部分成功,从而导致数据缺失。对于生存分析,事件时间的不同确定会影响对风险关联和中位生存的推断。

方法

我们修改了一些常用的处理缺失生存时间的现有方法,以适应这种不完美的联系情况,包括完全病例分析、删失、加权和几种多重插补方法。然后,我们进行模拟研究,以比较在存在缺失生存时间的情况下,提出的方法在估计风险因素或暴露的关联方面的性能,包括危险比 (HR) 和中位生存时间。还探讨了不同缺失数据机制和暴露-生存关联对其性能的影响。这些方法应用于宾夕法尼亚州阿默尔的一个历史队列,该队列是使用 1930 年美国人口普查建立的,其中只有 4514 人中有 2440 人(54%)可以从公开数据源和死亡证明中检索到死亡记录。使用这个队列,我们研究了职业和职业相关的石棉暴露对生存和按种族和性别划分的死亡率差异的影响。

结果

我们表明,基于条件生存的插补相对于完全病例分析在估计对数危险比和中位生存时间时,产生的偏差更小,效率更高。当这些方法应用于阿默尔队列时,我们发现职业暴露与死亡率之间存在显著关联,特别是在黑人个体和男性中,但职业相关暴露与死亡率之间不存在关联。

讨论

这项研究说明了不同的缺失生存时间插补方法的优缺点,因为行政或登记数据的联系不完美。方法的性能可能取决于缺失过程以及正在估计的参数和感兴趣的模型,在选择方法来处理缺失事件时间时,应该考虑这些因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee60/10935812/182434234281/12874_2024_2194_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验