Li Yi, Guo Guang
a Department of Sociology , University of North Carolina at Chapel Hill , Chapel Hill , North Carolina , USA.
Biodemography Soc Biol. 2014;60(2):212-28. doi: 10.1080/19485565.2014.953029.
This article introduces a novel way of taking advantage of genetic data in social surveys for the purposes of data quality control. Genetic information could detect and repair data issues such as missing data, reporting errors, differences in measures of the same variable, and flawed data. Using data from two surveys, the College Roommate Study (ROOM) and the National Longitudinal Study of Adolescent Health (Add Health), we show that proportion identical by descent score (a measure of genetic relationships) can identify "misreported" and unreported sibling type and detect misrepresented participants, bio-ancestry score (a measure of ancestral population memberships) can repair and recover missing race and discrepancies among different measures of self-reported race, and sex chromosomal information may help cross-check self-reported sex. This article represents an initial effort to utilize genetic data for the purposes of data quality control. As genetic data become increasingly available, researchers may explore more approaches to improving data quality.
本文介绍了一种在社会调查中利用遗传数据进行数据质量控制的新方法。遗传信息可以检测和修复数据问题,如缺失数据、报告错误、同一变量测量值的差异以及有缺陷的数据。利用来自两项调查的数据,即大学室友研究(ROOM)和青少年健康全国纵向研究(Add Health),我们表明,同源分数比例(一种遗传关系的度量)可以识别“误报”和未报告的兄弟姐妹类型,并检测出被错误表述的参与者,生物祖先分数(一种祖先群体成员身份的度量)可以修复和恢复缺失的种族以及不同自我报告种族测量之间的差异,而性染色体信息可能有助于交叉核对自我报告的性别。本文是利用遗传数据进行数据质量控制的初步尝试。随着遗传数据越来越容易获取,研究人员可能会探索更多提高数据质量的方法。