Scholz Markus, Hasenclever Dirk
University of Leipzig, Germany.
Int J Biostat. 2010;6(1):Article 1. doi: 10.2202/1557-4679.1162.
The measurement of biallelic pair-wise association called linkage disequilibrium (LD) is an important issue in order to understand the genomic architecture. A plethora of measures of association in two by two tables have been proposed in the literature. Beside the problem of choosing an appropriate measure, the problem of their estimation has been neglected in the literature. It needs to be emphasized that the definition of a measure and the choice of an estimator function for it are conceptually unrelated tasks. In this paper, we compare the performance of various estimators for the three popular LD measures D', r and Y in a simulation study for small to moderate samples sizes (N<=500). The usual frequency-plug-in estimators can lead to unreliable or undefined estimates. Estimators based on the computationally expensive volume measures have been proposed recently as a remedy to this well-known problem. We confirm that volume estimators have better expected mean square error than the naive plug-in estimators. But they are outperformed by estimators plugging-in easy to calculate non-informative Bayesian probability estimates into the theoretical formulae for the measures. Fully Bayesian estimators with non-informative Dirichlet priors have comparable accuracy but are computationally more expensive. We recommend the use of non-informative Bayesian plug-in estimators based on Jeffreys' prior, in particular when dealing with SNP array data where the occurrence of small table entries and table margins is likely.
为了理解基因组结构,对称为连锁不平衡(LD)的双等位基因成对关联进行测量是一个重要问题。文献中已经提出了大量用于二乘二表格的关联测量方法。除了选择合适测量方法的问题外,其估计问题在文献中一直被忽视。需要强调的是,测量方法的定义及其估计函数的选择在概念上是不相关的任务。在本文中,我们在一个针对小到中等样本量(N<=500)的模拟研究中,比较了三种常用LD测量方法D'、r和Y的各种估计器的性能。通常的频率代入估计器可能会导致不可靠或未定义的估计。最近有人提出基于计算成本高昂的体积测量的估计器来解决这个众所周知的问题。我们证实,体积估计器的期望均方误差比简单的代入估计器更好。但是,将易于计算的非信息性贝叶斯概率估计代入测量方法的理论公式中的估计器表现更优。具有非信息性狄利克雷先验的完全贝叶斯估计器具有相当的准确性,但计算成本更高。我们建议使用基于杰弗里斯先验的非信息性贝叶斯代入估计器,特别是在处理可能出现小表格条目和表格边缘的单核苷酸多态性(SNP)阵列数据时。