Welch Catherine, Bartlett Jonathan, Petersen Irene
University College London, London, uk,
London School of Hygiene & Tropical Medicine, London, uk,
Stata J. 2014 Apr 1;14(2):418-431.
Electronic health records of longitudinal clinical data are a valuable resource for health care research. One obstacle of using databases of health records in epidemiological analyses is that general practitioners mainly record data if they are clinically relevant. We can use existing methods to handle missing data, such as multiple imputation (mi), if we treat the unavailability of measurements as a missing-data problem. Most software implementations of MI do not take account of the longitudinal and dynamic structure of the data and are difficult to implement in large databases with millions of individuals and long follow-up. Nevalainen, Kenward, and Virtanen (2009, 28: 3657-3669) proposed the two-fold fully conditional specification algorithm to impute missing data in longitudinal data. It imputes missing values at a given time point, conditional on information at the same time point and immediately adjacent time points. In this article, we describe a new command, , that implements the two-fold fully conditional specification algorithm. It is extended to accommodate MI of longitudinal clinical records in large databases.
纵向临床数据的电子健康记录是医疗保健研究的宝贵资源。在流行病学分析中使用健康记录数据库的一个障碍是,全科医生主要记录具有临床相关性的数据。如果我们将测量数据的不可用视为一个缺失数据问题,我们可以使用现有方法来处理缺失数据,比如多重填补(mi)。多重填补的大多数软件实现没有考虑到数据的纵向和动态结构,并且在包含数百万个体和长期随访的大型数据库中难以实施。内瓦莱宁、肯沃德和维尔塔宁(2009年,28: 3657 - 3669)提出了双重完全条件设定算法,用于估算纵向数据中的缺失数据。它在给定时间点估算缺失值,条件是同一时间点及紧邻时间点的信息。在本文中,我们描述了一个新的命令,它实现了双重完全条件设定算法。它经过扩展,以适应大型数据库中纵向临床记录的多重填补。