Shewade Hemant Deepak, Vidhubala E, Subramani Divyaraj Prabhakar, Lal Pranay, Bhatt Neelam, Sundaramoorthi C, Singh Rana J, Kumar Ajay M V
a Department of Operational Research , International Union Against Tuberculosis and Lung Disease (The Union), South-East Asia Office , New Delhi , India.
b Department of Psycho-oncology , Cancer Institute (Women's India Association) , Chennai , India.
Glob Health Action. 2017;10(1):1394763. doi: 10.1080/16549716.2017.1394763.
A large state-wide tobacco survey was conducted using modified version of pretested, globally validated Global Adult Tobacco Survey (GATS) questionnaire in 2015-22016 in Tamil Nadu, India. Due to resource constrains, data collection was carrid out using paper-based questionnaires (unlike the GATS-India, 2009-2010, which used hand-held computer devices) while data entry was done using open access tools. The objective of this paper is to describe the process of data entry and assess its quality assurance and efficiency.
In EpiData language, a variable is referred to as 'field' and a questionnaire (set of fields) as 'record'. EpiData software was used for double data entry with adequate checks followed by validation. Teamviewer was used for remote training and trouble shooting. The EpiData databases (one each for each district and each zone in Chennai city) were housed in shared Dropbox folders, which enabled secure sharing of files and automatic back-up. Each database for a district/zone had separate file for data entry of household level and individual level questionnaire.
Of 32,945 households, there were 111,363 individuals aged ≥15 years. The average proportion of records with data entry errors for a district/zone in household level and individual level file was 4% and 24%, respectively. These are the errors that would have gone unnoticed if single entry was used. The median (inter-quartile range) time taken for double data entry for a single household level and individual level questionnaire was 30 (24, 40) s and 86 (64, 126) s, respectively.
Efficient and quality-assured near-real-time data entry in a large sub-national tobacco survey was performed using innovative, resource-efficient use of open access tools.
2015年至2016年期间,在印度泰米尔纳德邦使用经过预测试且在全球范围内得到验证的全球成人烟草调查(GATS)问卷的修改版,开展了一项全州范围的大型烟草调查。由于资源限制,数据收集使用纸质问卷(与2009年至2010年的印度GATS不同,后者使用手持计算机设备),而数据录入则使用开放获取工具完成。本文的目的是描述数据录入过程,并评估其质量保证和效率。
在EpiData语言中,变量被称为“字段”,问卷(一组字段)被称为“记录”。使用EpiData软件进行双数据录入,并进行充分检查,随后进行验证。Teamviewer用于远程培训和故障排除。EpiData数据库(每个区和钦奈市的每个分区各有一个)存储在共享的Dropbox文件夹中,这使得文件能够安全共享并自动备份。每个区/分区的数据库都有单独的文件用于家庭层面和个人层面问卷的数据录入。
在32945户家庭中,有111363名年龄≥15岁的个人。家庭层面和个人层面文件中,一个区/分区存在数据录入错误的记录的平均比例分别为4%和24%。如果使用单数据录入,这些错误可能不会被发现。单个家庭层面和个人层面问卷的双数据录入的中位时间(四分位间距)分别为30(24,40)秒和86(64,126)秒。
在一项大型次国家级烟草调查中,通过创新地、资源高效地使用开放获取工具,实现了高效且有质量保证的近实时数据录入。