New York State Department of Health, Bureau of Cancer Epidemiology, Albany, New York.
J Registry Manag. 2023 Winter;50(4):138-143.
Social Security numbers (SSNs) collected by cancer surveillance registries in the United States are used for patient matching, deduplication, follow-up, and linkage studies. However, due to various reasons, a small proportion of patient records have missing or inaccurate SSNs. Recently, New York State Cancer Registry (NYSCR) data have been linked to LexisNexis data to obtain patient demographic information, including SSNs. The current study evaluated the feasibility of using LexisNexis to improve SSN information in the NYSCR.
Patients diagnosed during the years 2005-2016, aged 21 or older, in the NYSCR were linked to LexisNexis data. For the matched patients, LexisNexis returned demographic information, including SSNs as available. Percentages of patients without LexisNexis matches or without LexisNexis SSNs were examined by demographic characteristics. We used multivariate logistic regression analyses to further evaluate how patient demographic characteristics affected the likelihood of no LexisNexis matches or of no SSNs returned. For patients with SSNs returned, LexisNexis SSNs were compared with registry SSNs. If patients had prior missing registry SSNs or if LexisNexis SSNs were inconsistent with registry SSNs, we used Match*Pro to review and verify match status. Registry SSNs were updated for those confirmed to be true matches. Improvement of SSNs was assessed based on percentage reduction of missingness.
Of 1,396,078 patient records submitted for LexisNexis linkage, 1.6% were not matched. Among those matched, 1.5% did not have SSNs returned. Multivariate logistic regression analyses indicated that patients who were female, Black, Asian Pacific Islander (API), Hispanic, born outside the United States, deceased, or living in poorer census tracts were more likely to not have LexisNexis matches, or to not have SSNs returned. Among 47,271 patients with missing registry SSNs (3.4%), 26,895 had SSNs returned from LexisNexis, and 24,919 were confirmed to be true matches. After registry SSNs updates, the percentage of SSN missingness was reduced to 1.7%, with a larger absolute reduction observed among those who were younger than 60 years, API, or alive. For 33,057 patients with inconsistent SSNs, 11,474 were due to incorrect consolidations of SSNs in the registry, and those SSNs were subsequently fixed.
LexisNexis is a valuable resource for improving the quality of SSN information in registries. Our results showed that the overall percentage of patients with missing SSNs was reduced from 3.4% to 1.7% after LexisNexis link-age, and SSNs that were initially incorrectly consolidated for some patients were also identified and subsequently fixed. However, the magnitude of SSN improvement varied by patient demographic characteristics. Data quality improvements often require resources, and this evaluation can assist registries with decisions related to similar efforts.
美国癌症监测登记处收集的社会安全号码(SSN)用于患者匹配、去重、随访和关联研究。然而,由于各种原因,一小部分患者的记录存在 SSN 缺失或不准确的情况。最近,纽约州癌症登记处(NYSCR)的数据已与 LexisNexis 数据链接,以获取患者的人口统计学信息,包括 SSN。本研究评估了使用 LexisNexis 改进 NYSCR 中 SSN 信息的可行性。
将 2005 年至 2016 年期间诊断的年龄在 21 岁及以上的 NYSCR 患者与 LexisNexis 数据进行链接。对于匹配的患者,LexisNexis 返回包括 SSN 在内的人口统计学信息。根据人口统计学特征,检查无 LexisNexis 匹配或无 LexisNexis SSN 的患者比例。我们使用多变量逻辑回归分析进一步评估患者的人口统计学特征如何影响无 LexisNexis 匹配或无 SSN 返回的可能性。对于返回 SSN 的患者,将 LexisNexis SSN 与登记处 SSN 进行比较。如果患者之前有缺失的登记处 SSN 或 LexisNexis SSN 与登记处 SSN 不一致,则使用 Match*Pro 进行审查和验证匹配状态。对于那些被确认为真实匹配的患者,更新登记处 SSN。根据缺失率的降低评估 SSN 的改进情况。
在提交给 LexisNexis 链接的 1396078 名患者记录中,有 1.6%未匹配。在匹配的患者中,有 1.5%没有返回 SSN。多变量逻辑回归分析表明,女性、黑人、亚太裔(API)、西班牙裔、出生在美国境外、死亡或居住在较贫困的普查区的患者更有可能没有 LexisNexis 匹配,或者没有返回 SSN。在 47271 名缺少登记处 SSN 的患者(3.4%)中,有 26895 名患者从 LexisNexis 返回了 SSN,其中 24919 名被确认为真实匹配。在更新登记处 SSN 后,SSN 缺失率降低至 1.7%,其中年龄在 60 岁以下、API 或存活的患者的绝对减少幅度更大。对于 33057 名 SSN 不一致的患者,有 11474 名是由于登记处中 SSN 的错误合并造成的,这些 SSN 随后被修复。
LexisNexis 是改进登记处 SSN 信息质量的有价值资源。我们的结果表明,在 LexisNexis 链接后,缺失 SSN 的患者比例从 3.4%降至 1.7%,并且最初为一些患者错误合并的 SSN 也被识别出来并随后进行了修正。然而,SSN 改进的程度因患者的人口统计学特征而异。数据质量改进通常需要资源,本评估可以帮助登记处做出与类似工作相关的决策。