Bailly Laurent, Daurès Jean Pierre, Dunais Brigitte, Pradier Christian
Department of Public Health, University Hospital of Nice, Nice, France.
Département de Santé Publique, CHU Nice, Hôpital Archet 1. Niveau1, Route Saint Antoine de Ginestière, BP 3079 06202, Nice, Cedex, France.
BMC Med Res Methodol. 2015 Apr 24;15:39. doi: 10.1186/s12874-015-0029-7.
Cancer incidence and prevalence estimates are necessary to inform health policy, to predict public health impact and to identify etiological factors. Registers have been used to estimate the number of cancer cases. To be reliable and useful, cancer registry data should be complete. Capture-recapture is a method for estimating the number of cases missed, originally developed in ecology to estimate the size of animal populations. Capture recapture methods in cancer epidemiology involve modelling the overlap between lists of individuals using log-linear models. These models rely on assumption of independence of sources and equal catchability between individuals, unlikely to be satisfied in cancer population as severe cases are more likely to be captured than simple cases.
To estimate cancer population and completeness of cancer registry, we applied M(th) models that rely on parameters that influence capture as time of capture (t) and individual heterogeneity (h) and compared results to the ones obtained with classical log-linear models and sample coverage approach. For three sources collecting breast and colorectal cancer cases (Histopathological cancer registry, hospital Multidisciplinary Team Meetings, and cancer screening programmes), individual heterogeneity is suspected in cancer population due to age, gender, screening history or presence of metastases. Individual heterogeneity is hardly analysed as classical log-linear models usually pool it with between-"list" dependence. We applied Bayesian Model Averaging which can be applied with small sample without asymptotic assumption, contrary to the maximum likelihood estimate procedure.
Cancer population estimates were based on the results of the M(h) model, with an averaged estimate of 803 cases of breast cancer and 521 cases of colorectal cancer. In the log-linear model, estimates were of 791 cases of breast cancer and 527 cases of colorectal cancer according to the retained models (729 and 481 histological cases, respectively).
We applied M(th) models and Bayesian population estimation to small sample of a cancer population. Advantage of M(th) models applied to cancer datasets, is the ability to explore individual factors associated with capture heterogeneity, as equal capture probability assumption is unlikely. M(th) models and Bayesian population estimation are well-suited for capture-recapture in a heterogeneous cancer population.
癌症发病率和患病率估计对于制定卫生政策、预测公共卫生影响以及确定病因至关重要。登记册已被用于估计癌症病例数。为了可靠且有用,癌症登记数据应完整。捕获再捕获法是一种估计遗漏病例数的方法,最初是在生态学中用于估计动物种群规模。癌症流行病学中的捕获再捕获方法涉及使用对数线性模型对个体列表之间的重叠进行建模。这些模型依赖于来源独立性和个体间捕获能力相等的假设,而在癌症人群中不太可能满足这一假设,因为重症病例比轻症病例更有可能被捕获。
为了估计癌症人群规模和癌症登记的完整性,我们应用了依赖于影响捕获的参数(如捕获时间(t)和个体异质性(h))的M(th)模型,并将结果与使用经典对数线性模型和样本覆盖方法获得的结果进行比较。对于收集乳腺癌和结直肠癌病例的三个来源(组织病理学癌症登记、医院多学科团队会议和癌症筛查项目),由于年龄、性别、筛查史或转移灶的存在,癌症人群中存在个体异质性。由于经典对数线性模型通常将个体异质性与“列表”间依赖性合并在一起,因此很难对其进行分析。我们应用了贝叶斯模型平均法,与最大似然估计程序不同,该方法可用于小样本且无需渐近假设。
癌症人群估计基于M(h)模型的结果,乳腺癌平均估计病例数为803例,结直肠癌为521例。在对数线性模型中,根据保留的模型,乳腺癌估计病例数为791例,结直肠癌为527例(组织学病例分别为729例和481例)。
我们将M(th)模型和贝叶斯人群估计应用于癌症人群的小样本。将M(th)模型应用于癌症数据集的优势在于,由于不太可能存在相等捕获概率假设,因此能够探索与捕获异质性相关的个体因素。M(th)模型和贝叶斯人群估计非常适合异质性癌症人群中的捕获再捕获分析。