在基于人群的癌症登记数据中，对诊断时“未知”分期使用多重填补法的有效性。

Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data.

作者信息

Luo Qingwei, Egger Sam, Yu Xue Qin, Smith David P, O'Connell Dianne L

机构信息

Cancer Research Division, Cancer Council NSW, Sydney, Australia.

Sydney School of Public Health, University of Sydney, Sydney, Australia.

出版信息

PLoS One. 2017 Jun 27;12(6):e0180033. doi: 10.1371/journal.pone.0180033. eCollection 2017.

DOI:10.1371/journal.pone.0180033

PMID:28654653

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5487067/

Abstract

BACKGROUND

The multiple imputation approach to missing data has been validated by a number of simulation studies by artificially inducing missingness on fully observed stage data under a pre-specified missing data mechanism. However, the validity of multiple imputation has not yet been assessed using real data. The objective of this study was to assess the validity of using multiple imputation for "unknown" prostate cancer stage recorded in the New South Wales Cancer Registry (NSWCR) in real-world conditions.

METHODS

Data from the population-based cohort study NSW Prostate Cancer Care and Outcomes Study (PCOS) were linked to 2000-2002 NSWCR data. For cases with "unknown" NSWCR stage, PCOS-stage was extracted from clinical notes. Logistic regression was used to evaluate the missing at random assumption adjusted for variables from two imputation models: a basic model including NSWCR variables only and an enhanced model including the same NSWCR variables together with PCOS primary treatment. Cox regression was used to evaluate the performance of MI.

RESULTS

Of the 1864 prostate cancer cases 32.7% were recorded as having "unknown" NSWCR stage. The missing at random assumption was satisfied when the logistic regression included the variables included in the enhanced model, but not those in the basic model only. The Cox models using data with imputed stage from either imputation model provided generally similar estimated hazard ratios but with wider confidence intervals compared with those derived from analysis of the data with PCOS-stage. However, the complete-case analysis of the data provided a considerably higher estimated hazard ratio for the low socio-economic status group and rural areas in comparison with those obtained from all other datasets.

CONCLUSIONS

Using MI to deal with "unknown" stage data recorded in a population-based cancer registry appears to provide valid estimates. We would recommend a cautious approach to the use of this method elsewhere.

摘要

背景

缺失数据的多重填补方法已通过多项模拟研究得到验证，这些研究通过在预先指定的缺失数据机制下对完全观测的阶段数据人为引入缺失值来进行。然而，多重填补的有效性尚未使用真实数据进行评估。本研究的目的是在现实世界条件下评估对新南威尔士州癌症登记处（NSWCR）记录的“未知”前列腺癌分期使用多重填补的有效性。

方法

基于人群的队列研究新南威尔士州前列腺癌护理与结局研究（PCOS）的数据与2000 - 2002年的NSWCR数据相链接。对于NSWCR分期为“未知”的病例，从临床记录中提取PCOS分期。使用逻辑回归来评估针对来自两个填补模型的变量调整后的随机缺失假设：一个仅包括NSWCR变量的基本模型和一个包括相同NSWCR变量以及PCOS初始治疗的增强模型。使用Cox回归来评估多重填补的性能。

结果

在1864例前列腺癌病例中，32.7%被记录为NSWCR分期“未知”。当逻辑回归纳入增强模型中的变量时，随机缺失假设得到满足，但仅纳入基本模型中的变量时则不满足。使用来自任一填补模型的填补分期数据的Cox模型提供的估计风险比通常相似，但与使用PCOS分期数据进行分析得出的置信区间相比更宽。然而，对数据进行的完整病例分析显示，与从所有其他数据集获得的结果相比，低社会经济地位组和农村地区的估计风险比要高得多。

结论

使用多重填补来处理基于人群的癌症登记处记录的“未知”分期数据似乎能提供有效的估计。我们建议在其他地方谨慎使用此方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4cc/5487067/98f0b5e72889/pone.0180033.g001.jpg

相似文献

Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data.在基于人群的癌症登记数据中，对诊断时“未知”分期使用多重填补法的有效性。

PLoS One. 2017 Jun 27;12(6):e0180033. doi: 10.1371/journal.pone.0180033. eCollection 2017.

BMJ Open. 2017 Jan 11;7(1):e014259. doi: 10.1136/bmjopen-2016-014259.

Assessing a modified-AJCC TNM staging system in the New South Wales Cancer Registry, Australia.评估澳大利亚新南威尔士癌症登记处的改良-AJCC TNM 分期系统。

BMC Cancer. 2019 Aug 28;19(1):850. doi: 10.1186/s12885-019-6062-x.

Characteristics of cases with unknown stage prostate cancer in a population-based cancer registry.基于人群的癌症登记处中不明分期前列腺癌病例的特征。

Cancer Epidemiol. 2013 Dec;37(6):813-9. doi: 10.1016/j.canep.2013.09.008. Epub 2013 Oct 6.

Imputation of missing prostate cancer stage in English cancer registry data based on clinical assumptions.基于临床假设对英国癌症登记处数据中缺失的前列腺癌分期进行推断。

Cancer Epidemiol. 2019 Feb;58:44-51. doi: 10.1016/j.canep.2018.11.003. Epub 2018 Nov 18.

Identifying incident colorectal and lung cancer cases in health service utilisation databases in Australia: a validation study.在澳大利亚医疗服务利用数据库中识别结直肠癌和肺癌新发病例：一项验证研究。

BMC Med Inform Decis Mak. 2017 Feb 27;17(1):23. doi: 10.1186/s12911-017-0417-5.

Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。

BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

Imputation of missing values of tumour stage in population-based cancer registration.基于人群的癌症登记中肿瘤分期缺失值的推断。

BMC Med Res Methodol. 2011 Sep 19;11:129. doi: 10.1186/1471-2288-11-129.

Detailed breast cancer pathology data for large-scale studies - access and completeness in NSW, Australia.详细的乳腺癌病理学数据用于大规模研究 - 在澳大利亚新南威尔士州的获取和完整性。

Public Health Res Pract. 2021 Dec 2;31(5):31012101. doi: 10.17061/phrp31012101.

The validity of using multiple imputation for missing out-of-hospital data in a state trauma registry.在一个州创伤登记处中，对缺失的院外数据使用多重填补法的有效性。

Acad Emerg Med. 2006 Mar;13(3):314-24. doi: 10.1197/j.aem.2005.09.011. Epub 2006 Feb 22.

引用本文的文献

Enhancing TNM Stage Completeness Using the SEER Summary Stage: A Nationwide Study From Korea.利用监测、流行病学和最终结果（SEER）总结分期提高TNM分期的完整性：一项来自韩国的全国性研究

J Prev Med Public Health. 2025 Jul;58(4):415-421. doi: 10.3961/jpmph.25.099. Epub 2025 Apr 23.

Informing health system planning for biomarker-based treatment: statistical prevalence projections for solid cancers with key pan-tumour biomarkers (dMMR, MSI, high TMB) in Australia to 2042.为基于生物标志物的治疗提供卫生系统规划信息：到2042年澳大利亚具有关键泛肿瘤生物标志物（错配修复缺陷、微卫星高度不稳定、肿瘤突变负荷高）的实体癌的统计患病率预测

Lancet Reg Health West Pac. 2025 Apr 4;57:101537. doi: 10.1016/j.lanwpc.2025.101537. eCollection 2025 Apr.

Incidence trends and relative survival of colorectal neuroendocrine neoplasms: A population-based study using German cancer registry data.结直肠神经内分泌肿瘤的发病率趋势及相对生存率：一项基于德国癌症登记数据的人群研究。

Int J Cancer. 2025 Jul 1;157(1):116-125. doi: 10.1002/ijc.35372. Epub 2025 Feb 20.

Survival differences between the USA and an urban population from China for all cancer types and 20 individual cancers: a population-based study.美国与中国城市人口在所有癌症类型及20种特定癌症方面的生存差异：一项基于人群的研究。

Lancet Reg Health West Pac. 2023 May 23;37:100799. doi: 10.1016/j.lanwpc.2023.100799. eCollection 2023 Aug.

Implications of missing data on reported breast cancer mortality.报告乳腺癌死亡率数据缺失的影响。

Breast Cancer Res Treat. 2023 Jan;197(1):177-187. doi: 10.1007/s10549-022-06764-4. Epub 2022 Nov 5.

Obtaining long-term stage-specific relative survival estimates in the presence of incomplete historical stage information.在存在不完全历史分期信息的情况下获得长期分期特异性相对生存估计。

Br J Cancer. 2022 Oct;127(6):1061-1068. doi: 10.1038/s41416-022-01866-8. Epub 2022 Jun 17.

Trends in lung cancer incidence by gender, histological type and stage at diagnosis in Japan, 1993 to 2015: A multiple imputation approach.日本 1993 年至 2015 年按性别、组织学类型和诊断时分期划分的肺癌发病率趋势：多重插补法。

Int J Cancer. 2022 Jul 1;151(1):20-32. doi: 10.1002/ijc.33962. Epub 2022 Feb 22.

Is young-onset esophageal adenocarcinoma increasing in Japan? An analysis of population-based cancer registries.日本青年型食管腺癌发病率是否上升？基于人群的癌症登记分析。

Cancer Med. 2022 Mar;11(5):1347-1356. doi: 10.1002/cam4.4528. Epub 2022 Jan 25.

A Comparison of Hypofractionated and Twice-Daily Thoracic Irradiation in Limited-Stage Small-Cell Lung Cancer: An Overlap-Weighted Analysis.局限期小细胞肺癌中短程分割与每日两次胸部照射的比较：重叠加权分析

Cancers (Basel). 2021 Jun 9;13(12):2895. doi: 10.3390/cancers13122895.

Nativity, ethnic enclave residence, and breast cancer survival among Latinas: Variations between California and Texas.拉丁裔女性的出生地、族裔飞地居住和乳腺癌生存：加利福尼亚州和得克萨斯州之间的差异。

Cancer. 2020 Jun 15;126(12):2849-2858. doi: 10.1002/cncr.32845. Epub 2020 Mar 17.

本文引用的文献

A population-based study of progression to metastatic prostate cancer in Australia.澳大利亚一项基于人群的转移性前列腺癌进展研究。

Cancer Epidemiol. 2015 Aug;39(4):617-22. doi: 10.1016/j.canep.2015.04.013. Epub 2015 May 14.

Estimating excess hazard ratios and net survival when covariate data are missing: strategies for multiple imputation.协变量数据缺失时估计超额风险比和净生存率：多重填补策略

Epidemiology. 2015 May;26(3):421-8. doi: 10.1097/EDE.0000000000000283.

Characteristics of cases with unknown stage prostate cancer in a population-based cancer registry.基于人群的癌症登记处中不明分期前列腺癌病例的特征。

Cancer Epidemiol. 2013 Dec;37(6):813-9. doi: 10.1016/j.canep.2013.09.008. Epub 2013 Oct 6.

Addressing missing covariates for the regression analysis of competing risks: Prognostic modelling for triaging patients diagnosed with prostate cancer.处理竞争风险回归分析中缺失的协变量：前列腺癌诊断患者分诊的预后建模

Stat Methods Med Res. 2016 Aug;25(4):1579-95. doi: 10.1177/0962280213492406. Epub 2013 Jun 26.

Unstaged cancer in a population-based registry: prevalence, predictors and patient prognosis.基于人群的登记处中未分期的癌症：患病率、预测因素和患者预后。

Cancer Epidemiol. 2013 Aug;37(4):498-504. doi: 10.1016/j.canep.2013.03.005. Epub 2013 Mar 31.

Projecting prevalence by stage of care for prostate cancer and estimating future health service needs: protocol for a modelling study.预测前列腺癌各治疗阶段的流行率并估算未来卫生服务需求：一项模型研究方案。

BMJ Open. 2011 Apr 7;1(1):e000104. doi: 10.1136/bmjopen-2011-000104.

Using linked routinely collected health data to describe prostate cancer treatment in New South Wales, Australia: a validation study.利用关联的常规健康数据描述澳大利亚新南威尔士州的前列腺癌治疗：一项验证研究。

BMC Health Serv Res. 2011 Oct 6;11:253. doi: 10.1186/1472-6963-11-253.

Imputation of missing values of tumour stage in population-based cancer registration.基于人群的癌症登记中肿瘤分期缺失值的推断。

BMC Med Res Methodol. 2011 Sep 19;11:129. doi: 10.1186/1471-2288-11-129.

Multiple imputation models should incorporate the outcome in the model of interest.多重填补模型应将结果纳入感兴趣的模型中。

Brain. 2011 Nov;134(Pt 11):e189; author reply e190. doi: 10.1093/brain/awr061. Epub 2011 Jun 6.

Multiple imputation using chained equations: Issues and guidance for practice.使用链式方程进行多重插补：实践中的问题和指导。

Stat Med. 2011 Feb 20;30(4):377-99. doi: 10.1002/sim.4067. Epub 2010 Nov 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在基于人群的癌症登记数据中，对诊断时“未知”分期使用多重填补法的有效性。

Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献