Garza Maryam Y, Williams Tremaine, Ounpraseuth Songthip, Hu Zhuopei, Lee Jeannette, Snowden Jessica, Walden Anita C, Simon Alan E, Devlin Lori A, Young Leslie W, Zozus Meredith N
Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, the United States of America; University of Texas Health Science Center at San Antonio, Joe R. & Teresa Lozano Long School of Medicine, San Antonio, TX, the United States of America.
Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, the United States of America.
Int J Med Inform. 2025 Mar;195:105749. doi: 10.1016/j.ijmedinf.2024.105749. Epub 2024 Dec 4.
In clinical research, prevention of data errors is paramount to ensuring reproducibility of trial results and the safety and efficacy of the resulting interventions. Over the last 40 years, empirical assessments of data accuracy in clinical research have been reported, however, there has been little systematic synthesis of these results. Although notable exceptions exist, little evidence exists regarding the relative accuracy of different data processing methods.
A systematic review of the literature identified through PubMed was performed to identify studies that evaluated the quality of data obtained through data processing methods typically used in clinical trials. Quantitative information on data accuracy was abstracted from the manuscripts and pooled. Meta-analysis of single proportions based on the Freeman-Tukey transformation method and the generalized linear mixed model approach were used to derive an overall estimate of error rates across data processing methods used in each study for comparison.
A total of 93 papers (published from 1978 to 2008) meeting our inclusion criteria were categorized according to their data processing methods. The accuracy associated with data processing methods varied widely, with error rates ranging from 2 errors per 10,000 fields to 2,784 errors per 10,000 fields. MRA was associated with both high and highly variable error rates, having a pooled error rate of 6.57% (95% CI: 5.51, 7.72). In comparison, the pooled error rates for optical scanning, single-data entry, and double-data entry methods were 0.74% (0.21, 1.60), 0.29% (0.24, 0.35) and 0.14% (0.08, 0.20), respectively.
Data processing methods may explain a significant amount of the variability in data accuracy. MRA error rates, for example, were high enough to impact decisions made using the data and could necessitate increases in sample sizes to preserve statistical power. Thus, the choice of data processing methods can likely impact process capability and, ultimately, the validity of trial results.
在临床研究中,防止数据错误对于确保试验结果的可重复性以及所产生干预措施的安全性和有效性至关重要。在过去40年中,已有关于临床研究数据准确性的实证评估报告,然而,这些结果几乎没有得到系统的综合分析。尽管存在显著的例外情况,但关于不同数据处理方法的相对准确性的证据很少。
对通过PubMed检索到的文献进行系统综述,以确定评估通过临床试验中常用的数据处理方法获得的数据质量的研究。从手稿中提取并汇总关于数据准确性的定量信息。基于弗里曼-图基变换法和广义线性混合模型方法对单一比例进行荟萃分析,以得出每项研究中使用的数据处理方法的错误率总体估计值,以便进行比较。
共有93篇符合纳入标准的论文(发表于1978年至2008年)根据其数据处理方法进行了分类。与数据处理方法相关的准确性差异很大,错误率从每10000个字段2个错误到每10000个字段2784个错误不等。MRA的错误率既高且变化很大,汇总错误率为6.57%(95%置信区间:5.51, 7.72)。相比之下,光学扫描、单次数据录入和双次数据录入方法的汇总错误率分别为0.74%(0.21, 1.60)、0.29%(0.24, 0.35)和0.14%(0.08, 0.20)。
数据处理方法可能解释了数据准确性方面的大量变异性。例如,MRA的错误率高到足以影响基于数据做出的决策,并且可能需要增加样本量以保持统计效力。因此,数据处理方法的选择可能会影响过程能力,并最终影响试验结果的有效性。