Adams M M, Wilson H G, Casto D L, Berg C J, McDermott J M, Gaudino J A, McCarthy B J
World Health Organization Collaborating Center in Perinatal Care and Health Services Research in Maternal Child Health, Division of Reproductive Health, Atlanta, GA, USA.
Am J Epidemiol. 1997 Feb 15;145(4):339-48. doi: 10.1093/oxfordjournals.aje.a009111.
Certificates of 1,449,287 live births and fetal deaths filed in Georgia from 1980 through 1992 were linked to create chronologies that, excluding induced abortions and ectopic pregnancies, constituted the reproductive experience of individual women. The authors initially used a deterministic method (whereby linking rules were not based on probability theory) to link as many records as possible, knowing that some of the linkages would be incorrect. They subsequently used a probabilistic method (whereby evaluation of linkages was developed from probability theory) to evaluate each linkage, and they broke those that were judged to be incorrect. Of the 1.4 million records, 38% did not link to another record. From the remaining records, 369,686 chains of two or more events were constructed. The longest chain included 12 events. Of the chains, 69% included two events; 22% included three events. Longer chains tended to have lower scores for probable validity. The probability-based evaluation of chains affected 3.0% of the records that had been in chains at the end of the deterministic linkage. A greater percentage of records in longer chains were affected by the evaluation. Unfortunately, the small subset of records that were the most difficult to link tended to overrepresent groups with the greatest risk of adverse pregnancy outcomes. Researchers contemplating a similar linkage can anticipate that, for the majority of records, linkage can be accomplished with a relatively straightforward, deterministic approach.
1980年至1992年在佐治亚州登记的1,449,287例活产和死胎证明被关联起来,以创建时间序列,这些时间序列排除了人工流产和宫外孕,构成了个体女性的生育经历。作者最初使用一种确定性方法(即关联规则并非基于概率论)来尽可能多地关联记录,因为他们知道其中一些关联会是错误的。随后,他们使用一种概率方法(即关联评估是根据概率论发展而来的)来评估每一个关联,并打破那些被判定为错误的关联。在这140万条记录中,38%没有与另一条记录关联。从其余记录中,构建了369,686个包含两个或更多事件的链条。最长的链条包含12个事件。在这些链条中,69%包含两个事件;22%包含三个事件。较长的链条在可能的有效性方面得分往往较低。基于概率的链条评估影响了确定性关联结束时处于链条中的3.0%的记录。较长链条中的记录受评估影响的比例更大。不幸的是,最难关联的一小部分记录往往过度代表了具有不良妊娠结局最大风险的群体。考虑进行类似关联的研究人员可以预期,对于大多数记录,可以通过相对直接的确定性方法来完成关联。