Hoevenaar-Blom Marieke P, Guillemont Juliette, Ngandu Tiia, Beishuizen Cathrien R L, Coley Nicola, Moll van Charante Eric P, Andrieu Sandrine, Kivipelto Miia, Soininen Hilkka, Brayne Carol, Meiller Yannick, Richard Edo
Department of Neurology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands.
INSERM, University of Toulouse, Toulouse, France.
PLoS One. 2017 Sep 12;12(9):e0182362. doi: 10.1371/journal.pone.0182362. eCollection 2017.
Lack of attention to missing data in research may result in biased results, loss of power and reduced generalizability. Registering reasons for missing values at the time of data collection, or-in the case of sharing existing data-before making data available to other teams, can save time and efforts, improve scientific value and help to prevent erroneous assumptions and biased results. To ensure that encoding of missing data is sufficient to understand the reason why data are missing, it should ideally be context-free. Therefore, 11 context-free codes of missing data were carefully designed based on three completed randomized controlled clinical trials and tested in a new randomized controlled clinical trial by an international team consisting of clinical researchers and epidemiologists with extended experience in designing and conducting trials and an Information System expert. These codes can be divided into missing due to participant and/or participation characteristics (n = 6), missing by design (n = 4), and due to a procedural error (n = 1). Broad implementation of context-free missing data encoding may enhance the possibilities of data sharing and pooling, thus allowing more powerful analyses using existing data.
研究中对缺失数据缺乏关注可能会导致结果有偏差、效能降低和普遍性下降。在数据收集时记录缺失值的原因,或者——在共享现有数据的情况下——在将数据提供给其他团队之前记录原因,可以节省时间和精力,提高科学价值,并有助于防止错误的假设和有偏差的结果。为确保缺失数据的编码足以理解数据缺失的原因,理想情况下它应该是无背景的。因此,基于三项完成的随机对照临床试验,精心设计了11种无背景的缺失数据编码,并由一个由临床研究人员、在设计和开展试验方面有丰富经验的流行病学家以及一名信息系统专家组成的国际团队,在一项新的随机对照临床试验中进行了测试。这些编码可分为因参与者和/或参与特征导致的缺失(n = 6)、设计导致的缺失(n = 4)以及程序错误导致的缺失(n = 1)。广泛实施无背景的缺失数据编码可能会增加数据共享和合并的可能性,从而允许使用现有数据进行更有力的分析。