SAIL Databank, Swansea University Medical School, Swansea, United Kingdom.
Swansea University Medical School, Swansea, United Kingdom.
PLoS One. 2020 Feb 11;15(2):e0228545. doi: 10.1371/journal.pone.0228545. eCollection 2020.
A key requirement for longitudinal studies using routinely-collected health data is to be able to measure what individuals are present in the datasets used, and over what time period. Individuals can enter and leave the covered population of administrative datasets for a variety of reasons, including both life events and characteristics of the datasets themselves. An automated, customizable method of determining individuals' presence was developed for the primary care dataset in Swansea University's SAIL Databank. The primary care dataset covers only a portion of Wales, with 76% of practices participating. The start and end date of the data varies by practice. Additionally, individuals can change practices or leave Wales. To address these issues, a two step process was developed. First, the period for which each practice had data available was calculated by measuring changes in the rate of events recorded over time. Second, the registration records for each individual were simplified. Anomalies such as short gaps and overlaps were resolved by applying a set of rules. The result of these two analyses was a cleaned set of records indicating start and end dates of available primary care data for each individual. Analysis of GP records showed that 91.0% of events occurred within periods calculated as having available data by the algorithm. 98.4% of those events were observed at the same practice of registration as that computed by the algorithm. A standardized method for solving this common problem has enabled faster development of studies using this data set. Using a rigorous, tested, standardized method of verifying presence in the study population will also positively influence the quality of research.
对于使用常规收集的健康数据进行纵向研究,一个关键要求是能够衡量数据集所包含的个体以及他们的出现时间。个体可以出于各种原因进入和离开行政数据集的覆盖人群,包括生活事件和数据集本身的特征。斯旺西大学 SAIL 数据库中的初级保健数据集开发了一种自动化、可定制的确定个体存在的方法。初级保健数据集仅涵盖威尔士的一部分,有 76%的实践参与其中。数据的开始和结束日期因实践而异。此外,个体可以改变实践或离开威尔士。为了解决这些问题,开发了两步流程。首先,通过测量随时间记录的事件率的变化来计算每个实践可用数据的时间段。其次,简化了每个个体的注册记录。通过应用一组规则解决了短间隙和重叠等异常情况。这两项分析的结果是一组清理后的记录,显示了每个个体的初级保健数据的可用开始和结束日期。对全科医生记录的分析表明,91.0%的事件发生在算法计算的可用数据时间段内。这些事件中有 98.4%是在与算法计算的注册实践相同的实践中观察到的。解决这个常见问题的标准化方法使使用该数据集进行研究的开发速度更快。使用严格、经过测试、标准化的方法来验证研究人群中的存在,也将积极影响研究的质量。