Division of Urology, Department of Surgery, University of Melbourne, Royal Melbourne Hospital and the Australian Prostate Cancer Research Centre Epworth, Melbourne, Victoria, Australia.
BMJ Open. 2013 May 28;3(5):e002406. doi: 10.1136/bmjopen-2012-002406.
Data errors are a well-documented part of clinical datasets as is their potential to confound downstream analysis. In this study, we explore the reliability of manually transcribed data across different pathology fields in a prostate cancer database and also measure error rates attributable to the source data.
Descriptive study.
Specialist urology service at a single centre in metropolitan Victoria in Australia.
Between 2004 and 2011, 1471 patients underwent radical prostatectomy at our institution. In a large proportion of these cases, clinicopathological variables were recorded by manual data-entry. In 2011, we obtained electronic versions of the same printed pathology reports for our cohort. The data were electronically imported in parallel to any existing manual entry record enabling direct comparison between them.
Error rates of manually entered data compared with electronically imported data across clinicopathological fields.
421 patients had at least 10 comparable pathology fields between the electronic import and manual records and were selected for study. 320 patients had concordant data between manually entered and electronically populated fields in a median of 12 pathology fields (range 10-13), indicating an outright accuracy in manually entered pathology data in 76% of patients. Across all fields, the error rate was 2.8%, while individual field error ranges from 0.5% to 6.4%. Fields in text formats were significantly more error-prone than those with direct measurements or involving numerical figures (p<0.001). 971 cases were available for review of error within the source data, with figures of 0.1-0.9%.
While the overall rate of error was low in manually entered data, individual pathology fields were variably prone to error. High-quality pathology data can be obtained for both prospective and retrospective parts of our data repository and the electronic checking of source pathology data for error is feasible.
数据错误是临床数据集众所周知的一部分,其对下游分析产生混淆的可能性也很大。在这项研究中,我们探索了在前列腺癌数据库中不同病理学领域中手动转录数据的可靠性,并测量了归因于源数据的错误率。
描述性研究。
澳大利亚维多利亚州大都市的一家专业泌尿科服务机构。
在我们机构,2004 年至 2011 年间,有 1471 名患者接受了根治性前列腺切除术。在这些病例中的很大一部分,临床病理变量是通过手动数据录入记录的。2011 年,我们为我们的队列获得了相同打印病理报告的电子版本。这些数据被平行电子导入,与任何现有的手动录入记录进行比较。
手动录入数据与电子导入数据在临床病理领域的错误率比较。
在电子导入和手动记录之间有至少 10 个可比病理字段的 421 名患者被选择进行研究。320 名患者在手动录入和电子录入的中位数为 12 个病理字段(范围 10-13)的字段中具有一致的数据,这表明 76%的患者的手动录入病理数据是准确的。在所有字段中,错误率为 2.8%,而个别字段的错误率范围为 0.5%至 6.4%。文本格式的字段比直接测量或涉及数字的字段更容易出错(p<0.001)。在源数据中,有 971 例可供审查错误,比例为 0.1-0.9%。
虽然手动录入数据的整体错误率较低,但个别病理字段容易出错。我们的数据存储库的前瞻性和回顾性部分都可以获得高质量的病理数据,并且可以对源病理数据进行电子检查以发现错误。