• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

海量数据处理:用于亚利桑那州国家人类暴露评估调查(NHEXAS)的自动数据处理系统。

Mass data massage: an automated data processing system used for NHEXAS, Arizona. National Human Exposure Assessment Survey.

作者信息

O'Rourke M K, Fernandez L M, Bittel C N, Sherrill J L, Blackwell T S, Robbins D R

机构信息

Environmental and Occupational Health Unit of the Arizona Prevention Center, The University of Arizona, Tucson 85721-0468, USA.

出版信息

J Expo Anal Environ Epidemiol. 1999 Sep-Oct;9(5):471-84. doi: 10.1038/sj.jea.7500043.

DOI:10.1038/sj.jea.7500043
PMID:10554149
Abstract

Data entry and management are critical components of all large survey projects; data quality objectives must be met and data must be quickly and readily accessible. We developed a comprehensive system for data entry and management utilizing scannable forms with bubble fields and handwriting recognition. This 'Mass Data Massage' (MDM) system had three components: (1) form creation and database definition; (2) programming of data dictionaries for documentation and preliminary logic and range checks; and (3) data entry, management and documentation using the 'Mass Data Cleaning Program' (MDCP). Scannable forms were written in Teleform, where the data field definition, variable names and ranges were defined as the form was created. Completed forms were returned from the field, subjected to final field quality control (QC) checks, and transferred to the data management section. They were batched and coded as necessary. Once a batch of data was scanned and visually verified, the operator called up the menu for the MDCP. The MDCP had 31 program modules with 500-1200 lines of code each. The operator could select and run the appropriate dictionary on each data batch 'correcting' apparent errors in responses. This process was iterative until the data batch passed all dictionary checks. Proposed 'changes' were forwarded to the data coordinator (DC) for acceptance or rejection. After all errors had been resolved, each data batch was subjected to a 10% quality assurance (QA) check. The original data batch and associated file of applied changes were archived. Time expenditure using the scanning approach varied with the number of questions and the types of responses (handwritten or bubble fields). One-page forms took 42-60% of the time needed for hand entry; forms longer than 10 pages took 35-38% of the time. Use of faster machines will further speed the process. The main advantage of the system was the reduction of systematic errors. Scanning alone reduced errors found on 995 NHEXAS Baseline Questionnaires. Overall, the dictionary identified 0.55% errors on the scanned forms. Ten percent QC checks, performed on corrected batches ready for appendage to the master database, revealed an overall error rate of 0.02%. Similar checks on a laboratory form scanned from numeric handwriting detected 0.3% errors following dictionary application and 0.2% errors during the 10% QA check. This system was faster, more accurate, and more cost-effective than hand entry of data. A batch of data that took >1 week to process using the hand entry method was processed within 1 day using MDM. Human coding of specific answers and the final verification were the most time-consuming processes.

摘要

数据录入和管理是所有大型调查项目的关键组成部分;必须实现数据质量目标,并且数据必须能够快速且方便地获取。我们开发了一个全面的数据录入和管理系统,该系统利用带有气泡字段和手写识别功能的可扫描表格。这个“海量数据处理”(MDM)系统有三个组成部分:(1)表单创建和数据库定义;(2)为文档编制以及初步逻辑和范围检查编写数据字典程序;(3)使用“海量数据清理程序”(MDCP)进行数据录入、管理和文档编制。可扫描表单是用Teleform编写的,在创建表单时就定义了数据字段定义、变量名和范围。填好的表单从实地返回,经过最终的字段质量控制(QC)检查,然后转移到数据管理部门。必要时对它们进行分批和编码。一旦一批数据被扫描并经过目视验证,操作员就会调出MDCP的菜单。MDCP有31个程序模块,每个模块有500 - 1200行代码。操作员可以为每个数据批次选择并运行适当的字典,以“纠正”回答中明显的错误。这个过程是迭代的,直到数据批次通过所有字典检查。提议的“更改”会转发给数据协调员(DC)以供接受或拒绝。在所有错误都得到解决后,对每个数据批次进行10%的质量保证(QA)检查。原始数据批次和应用更改的相关文件会被存档。使用扫描方法的时间花费因问题数量和回答类型(手写或气泡字段)而异。单页表单花费的时间是手工录入所需时间的42% - 60%;超过10页的表单花费的时间是35% - 38%。使用更快的机器将进一步加快这个过程。该系统的主要优点是减少了系统误差。仅扫描就减少了在995份NHEXAS基线调查问卷中发现的错误。总体而言,字典在扫描表单上识别出0.55%的错误。对准备附加到主数据库的已校正批次进行的10%的QC检查显示,总体错误率为0.02%。对一份从数字手写扫描而来的实验室表单进行的类似检查显示,应用字典后检测到0.3%的错误,在10%的QA检查期间检测到0.2%的错误。这个系统比手工录入数据更快、更准确且更具成本效益。一批使用手工录入方法需要超过1周时间处理的数据,使用MDM在1天内就处理完了。对特定答案进行人工编码和最终验证是最耗时的过程。

相似文献

1
Mass data massage: an automated data processing system used for NHEXAS, Arizona. National Human Exposure Assessment Survey.海量数据处理:用于亚利桑那州国家人类暴露评估调查(NHEXAS)的自动数据处理系统。
J Expo Anal Environ Epidemiol. 1999 Sep-Oct;9(5):471-84. doi: 10.1038/sj.jea.7500043.
2
Electronic case-report forms of symptoms and impairments of peripheral neuropathy.周围神经病变症状和损伤的电子病例报告表。
Can J Neurol Sci. 2002 Aug;29(3):258-66. doi: 10.1017/s0317167100002043.
3
Forms control and error detection procedures used at the Coordinating Center of the Multiple Risk Factor Intervention Trial (MRFIT).多重危险因素干预试验(MRFIT)协调中心所使用的表格控制与错误检测程序。
Control Clin Trials. 1986 Sep;7(3 Suppl):34S-45S. doi: 10.1016/0197-2456(86)90158-3.
4
Pivot/Remote: a distributed database for remote data entry in multi-center clinical trials.Pivot/Remote:一种用于多中心临床试验中远程数据录入的分布式数据库。
Medinfo. 1995;8 Pt 2:1097.
5
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
6
Accuracy of bar codes versus handwriting for recording trauma resuscitation events.用于记录创伤复苏事件的条形码与手写记录的准确性比较。
Ann Emerg Med. 1993 Oct;22(10):1545-50. doi: 10.1016/s0196-0644(05)81256-9.
7
Structure and software tools of AIDA.AIDA的结构与软件工具。
Comput Methods Programs Biomed. 1987 Nov-Dec;25(3):259-73. doi: 10.1016/0169-2607(87)90083-6.
8
A comparison of error detection rates between the reading aloud method and the double data entry method.大声朗读法与重复数据录入法之间错误检测率的比较。
Control Clin Trials. 2003 Oct;24(5):560-9. doi: 10.1016/s0197-2456(03)00089-8.
9
A Swiss cheese error detection method for real-time EPID-based quality assurance and error prevention.一种用于基于实时电子射野影像装置的质量保证和差错预防的瑞士奶酪差错检测方法。
Med Phys. 2017 Apr;44(4):1212-1223. doi: 10.1002/mp.12142. Epub 2017 Mar 17.
10
Quality of data entry using single entry, double entry and automated forms processing--an example based on a study of patient-reported outcomes.数据录入质量的单录入、双录入和自动化表单处理——基于患者报告结局研究的示例。
PLoS One. 2012;7(4):e35087. doi: 10.1371/journal.pone.0035087. Epub 2012 Apr 6.

引用本文的文献

1
Error Rates of Data Processing Methods in Clinical Research: A Systematic Review and Meta-Analysis of Manuscripts Identified Through PubMed.临床研究中数据处理方法的错误率:通过PubMed识别的手稿的系统评价和荟萃分析
Res Sq. 2023 Dec 21:rs.3.rs-2386986. doi: 10.21203/rs.3.rs-2386986/v2.
2
Effects of organisational and patient factors on doctors' burnout: a national survey in China.组织因素和患者因素对医生 burnout 的影响:中国的一项全国性调查。
BMJ Open. 2019 Jul 1;9(7):e024531. doi: 10.1136/bmjopen-2018-024531.
3
Incorporating scannable forms into immunization data collection processes: a mixed-methods study.
将可扫描表单纳入免疫数据收集流程:一项混合方法研究。
PLoS One. 2012;7(12):e49627. doi: 10.1371/journal.pone.0049627. Epub 2012 Dec 18.
4
Arsenic exposure, diabetes prevalence, and diabetes control in the Strong Heart Study.砷暴露、糖尿病患病率以及“强壮心脏研究”中的糖尿病控制情况。
Am J Epidemiol. 2012 Nov 15;176(10):865-74. doi: 10.1093/aje/kws153. Epub 2012 Oct 24.
5
Quantifying data quality for clinical trials using electronic data capture.使用电子数据捕获技术量化临床试验的数据质量。
PLoS One. 2008 Aug 25;3(8):e3049. doi: 10.1371/journal.pone.0003049.