Medicine, Stanford University, Stanford, California, USA.
Political Science, Stanford University, Stanford, California, USA.
Inj Prev. 2020 Apr;26(2):153-158. doi: 10.1136/injuryprev-2019-043385. Epub 2019 Oct 29.
Virtually all existing evidence linking access to firearms to elevated risks of mortality and morbidity comes from ecological and case-control studies. To improve understanding of the health risks and benefits of firearm ownership, we launched a cohort study: the Longitudinal Study of Handgun Ownership and Transfer (LongSHOT).
Using probabilistic matching techniques we linked three sources of individual-level, state-wide data in California: official voter registration records, an archive of lawful handgun transactions and all-cause mortality data. There were nearly 28.8 million unique voter registrants, 5.5 million handgun transfers and 3.1 million deaths during the study period (18 October 2004 to 31 December 2016). The linkage relied on several identifying variables (first, middle and last names; date of birth; sex; residential address) that were available in all three data sets, deploying them in a series of bespoke algorithms.
Assembly of the LongSHOT cohort commenced in January 2016 and was completed in March 2019. Approximately three-quarters of matches identified were exact matches on all link variables. The cohort consists of 28.8 million adult residents of California followed for up to 12.2 years. A total of 1.2 million cohort members purchased at least one handgun during the study period, and 1.6 million died.
Three steps taken early may be particularly useful in enhancing the efficiency of large-scale data linkage: thorough data cleaning; assessment of the suitability of off-the-shelf data linkage packages relative to bespoke coding; and careful consideration of the minimum sample size and matching precision needed to support rigorous investigation of the study questions.
几乎所有将获得枪支与死亡率和发病率升高联系起来的现有证据都来自于生态学和病例对照研究。为了更好地了解枪支拥有的健康风险和益处,我们开展了一项队列研究:手枪拥有和转让的纵向研究(LongSHOT)。
我们使用概率匹配技术将加利福尼亚州的三个来源的个人层面、全州范围的数据进行了链接:官方选民登记记录、合法手枪交易档案和全因死亡率数据。在研究期间(2004 年 10 月 18 日至 2016 年 12 月 31 日),有近 2880 万独特的选民登记人、550 万手枪转让和 310 万死亡。链接依赖于所有三个数据集都可用的几个识别变量(姓名、出生日期、性别、居住地址),并使用一系列定制算法进行部署。
2016 年 1 月开始组建 LongSHOT 队列,并于 2019 年 3 月完成。确定的匹配中有近四分之三是所有链接变量的完全匹配。该队列包括加利福尼亚州的 2880 万成年居民,最多随访 12.2 年。在研究期间,共有 120 万队列成员至少购买了一支手枪,有 160 万人死亡。
在提高大规模数据链接效率方面,早期采取的三个步骤可能特别有用:彻底的数据清理;评估现成的数据链接软件包相对于定制编码的适用性;以及仔细考虑支持严格调查研究问题所需的最小样本量和匹配精度。