Herrchen B, Gould J B, Nesbitt T S
Health Information Solutions, Menlo Park, California, USA.
Comput Biomed Res. 1997 Aug;30(4):290-305. doi: 10.1006/cbmr.1997.1448.
A methodology for linking vital statistics linked birth/death data and hospital discharge data is described. The resulting data set combines information on a neonate's sociodemographic characteristics, prenatal care, and mortality aspects and connects it to detailed health outcome and resource utilization data, thus establishing an extensive database for epidemiological studies. In the absence of a universal identifier common to both databases, our linkage strategy relied on using a virtual identifier based on variables common to both data sets. In the case of multiple incidences of the same virtual identifier we used secondary health status information to optimize the likelihood of linking low birth weight or premature infants in one database to infants of similar health status in the other while randomizing cases in which no secondary information was present. Applying our method to the 1992 California birth cohort, we could link 563,114 out of 571,189 eligible births (98.59%). Of these links, 91.2% were established on the basis of unique virtual identifiers. The link was internally consistent and no bias was evident when comparing variable distributions for all single live births in the vital statistics linked birth/death file and linked births in the linked vital statistics linked birth/death and hospital discharge file. Multiple imputation techniques showed that the prediction error incurred by randomization was negligible. Even though computationally intensive, our method for linking the vital statistics linked birth/death file and the hospital discharge file appeared to be effective. However, it is important to be aware of the limitations of the resulting data set, in particular the fact that it cannot be used for tracking individual cases. The method provides a database suitable for a variety of perinatal epidemiological analyses, such as descriptive studies of disease distribution in neonates, studies of the geographic distribution of disease, and studies of the relationship between risk and outcome.
本文描述了一种将生命统计数据(关联的出生/死亡数据)与医院出院数据相链接的方法。所得数据集整合了新生儿的社会人口学特征、产前护理及死亡率方面的信息,并将其与详细的健康结局和资源利用数据相连接,从而建立了一个用于流行病学研究的广泛数据库。由于两个数据库没有通用的标识符,我们的链接策略依赖于使用基于两个数据集共有的变量的虚拟标识符。对于同一虚拟标识符出现多次的情况,我们使用二级健康状况信息来优化将一个数据库中的低出生体重或早产儿与另一个数据库中健康状况相似的婴儿相链接的可能性,同时对没有二级信息的情况进行随机化处理。将我们的方法应用于1992年加利福尼亚出生队列,在571,189例符合条件的出生中,我们成功链接了563,114例(98.59%)。在这些链接中,91.2%是基于唯一的虚拟标识符建立的。该链接在内部是一致的,在比较生命统计关联出生/死亡文件中所有单胎活产的变量分布与关联生命统计关联出生/死亡和医院出院文件中的关联出生时,没有明显偏差。多重插补技术表明,随机化产生的预测误差可以忽略不计。尽管计算量很大,但我们将生命统计关联出生/死亡文件与医院出院文件相链接的方法似乎是有效的。然而,必须意识到所得数据集的局限性,特别是它不能用于跟踪个体病例这一事实。该方法提供了一个适用于各种围产期流行病学分析的数据库,例如新生儿疾病分布的描述性研究、疾病地理分布研究以及风险与结局关系的研究。