基于家系的关联分析方法及其在不确定基因型数据中的应用

Family-based association tests using genotype data with uncertainty.

机构信息

Department of Statistics, University of California, Irvine, CA 92697, USA.

出版信息

Biostatistics. 2012 Apr;13(2):228-40. doi: 10.1093/biostatistics/kxr045. Epub 2011 Dec 8.

DOI:10.1093/biostatistics/kxr045

PMID:22156512

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3297829/

Abstract

Family-based association studies have been widely used to identify association between diseases and genetic markers. It is known that genotyping uncertainty is inherent in both directly genotyped or sequenced DNA variations and imputed data in silico. The uncertainty can lead to genotyping errors and missingness and can negatively impact the power and Type I error rates of family-based association studies even if the uncertainty is independent of disease status. Compared with studies using unrelated subjects, there are very few methods that address the issue of genotyping uncertainty for family-based designs. The limited attempts have mostly been made to correct the bias caused by genotyping errors. Without properly addressing the issue, the conventional testing strategy, i.e. family-based association tests using called genotypes, can yield invalid statistical inferences. Here, we propose a new test to address the challenges in analyzing case-parents data by using calls with high accuracy and modeling genotype-specific call rates. Our simulations show that compared with the conventional strategy and an alternative test, our new test has an improved performance in the presence of substantial uncertainty and has a similar performance when the uncertainty level is low. We also demonstrate the advantages of our new method by applying it to imputed markers from a genome-wide case-parents association study.

摘要

基于家系的关联研究已被广泛用于识别疾病与遗传标记之间的关联。众所周知，直接对 DNA 变异进行基因分型或测序以及在计算机上对数据进行推测都会存在基因分型不确定性。这种不确定性会导致基因分型错误和缺失，并可能对基于家系的关联研究的效能和 I 型错误率产生负面影响，即使这种不确定性与疾病状态无关。与使用无关个体的研究相比，针对基于家系设计的基因分型不确定性问题的方法非常少。已有的尝试大多集中在纠正基因分型错误引起的偏差上。如果不妥善解决这个问题，传统的测试策略，即使用已确定基因型的基于家系的关联测试，可能会产生无效的统计推断。在这里，我们提出了一种新的测试方法，用于通过使用高精度的调用和建模基因型特异性调用率来解决分析病例-父母数据的挑战。我们的模拟表明，与传统策略和另一种测试相比，在存在大量不确定性的情况下，我们的新测试具有更好的性能，而在不确定性水平较低时，其性能则相似。我们还通过应用于全基因组病例-父母关联研究中的推测标记，展示了我们新方法的优势。