Kontopantelis Evangelos, Parisi Rosa, Springate David A, Reeves David
NIHR School for Primary Care Research, University of Manchester, Williamson Building, Oxford Road, Manchester, M13 9PL, UK.
Farr Institute for Health Informatics Research, University of Manchester, Vaughan House, Portsmouth Street, Manchester, M13 9GB, UK.
BMC Res Notes. 2017 Jan 13;10(1):41. doi: 10.1186/s13104-016-2365-z.
In modern health care systems, the computerization of all aspects of clinical care has led to the development of large data repositories. For example, in the UK, large primary care databases hold millions of electronic medical records, with detailed information on diagnoses, treatments, outcomes and consultations. Careful analyses of these observational datasets of routinely collected data can complement evidence from clinical trials or even answer research questions that cannot been addressed in an experimental setting. However, 'missingness' is a common problem for routinely collected data, especially for biological parameters over time. Absence of complete data for the whole of a individual's study period is a potential bias risk and standard complete-case approaches may lead to biased estimates. However, the structure of the data values makes standard cross-sectional multiple-imputation approaches unsuitable. In this paper we propose and evaluate mibmi, a new command for cleaning and imputing longitudinal body mass index data.
The regression-based data cleaning aspects of the algorithm can be useful when researchers analyze messy longitudinal data. Although the multiple imputation algorithm is computationally expensive, it performed similarly or even better to existing alternatives, when interpolating observations.
The mibmi algorithm can be a useful tool for analyzing longitudinal body mass index data, or other longitudinal data with very low individual-level variability.
在现代医疗保健系统中,临床护理各方面的计算机化促使了大型数据库的发展。例如,在英国,大型初级保健数据库存有数百万份电子病历,包含诊断、治疗、结果及会诊的详细信息。对这些常规收集数据的观察数据集进行仔细分析,可补充临床试验的证据,甚至能回答在实验环境中无法解决的研究问题。然而,“缺失值”是常规收集数据中常见的问题,尤其是生物参数随时间变化的数据。在个体整个研究期间缺乏完整数据存在潜在的偏差风险,采用标准的完全病例法可能会导致估计偏差。但是,数据值的结构使得标准的横断面多重填补方法并不适用。在本文中,我们提出并评估了mibmi,这是一个用于清理和填补纵向体重指数数据的新命令。
当研究人员分析杂乱的纵向数据时,该算法基于回归的数据清理功能可能会很有用。尽管多重填补算法计算成本高昂,但在插补观测值时,其表现与现有替代方法相似甚至更好。
mibmi算法可成为分析纵向体重指数数据或其他个体水平变异性极低的纵向数据的有用工具。