Department of Statistics and Finance, University of Science and Technology of China, Anhui, Hefei, China.
Comput Biol Chem. 2011 Feb;35(1):40-9. doi: 10.1016/j.compbiolchem.2010.12.006. Epub 2011 Jan 22.
Affected relatives are essential for pedigree linkage analysis, however, they cause a violation of the independent sample assumption in case-control association studies. To avoid the correlation between samples, a common practice is to take only one affected sample per pedigree in association analysis. Although several methods exist in handling correlated samples, they are still not widely used in part because these are not easily implemented, or because they are not widely known. We advocate the effective sample size method as a simple and accessible approach for case-control association analysis with correlated samples. This method modifies the chi-square test statistic, p-value, and 95% confidence interval of the odds-ratio by replacing the apparent number of allele or genotype counts with the effective ones in the standard formula, without the need for specialized computer programs. We present a simple formula for calculating effective sample size for many types of relative pairs and relative sets. For allele frequency estimation, the effective sample size method captures the variance inflation exactly. For genotype frequency, simulations showed that effective sample size provides a satisfactory approximation. A gene which is previously identified as a type 1 diabetes susceptibility locus, the interferon-induced helicase gene (IFIH1), is shown to be significantly associated with rheumatoid arthritis when the effective sample size method is applied. This significant association is not established if only one affected sib per pedigree were used in the association analysis. Relationship between the effective sample size method and other methods - the generalized estimation equation, variance of eigenvalues for correlation matrices, and genomic controls - are discussed.
受影响的亲属对于家系连锁分析至关重要,但在病例对照关联研究中,他们会导致违反独立样本假设。为了避免样本之间的相关性,通常在关联分析中,每个家系仅采用一个受影响的样本。尽管存在几种处理相关样本的方法,但它们仍未被广泛使用,部分原因是这些方法不易实施,或者因为它们不广为人知。我们提倡使用有效样本量方法,作为处理相关样本的病例对照关联分析的一种简单易行的方法。该方法通过用标准公式中的有效计数替代明显的等位基因或基因型计数来修改卡方检验统计量、p 值和优势比的 95%置信区间,而无需专门的计算机程序。我们提出了一种简单的公式,用于计算许多类型的相对对和相对集合的有效样本量。对于等位基因频率估计,有效样本量方法可以准确捕捉方差膨胀。对于基因型频率,模拟结果表明,有效样本量提供了令人满意的近似值。先前被确定为 1 型糖尿病易感基因座的干扰素诱导解旋酶基因 (IFIH1),在应用有效样本量方法时,与类风湿关节炎显著相关。如果在关联分析中每个家系仅采用一个受影响的同胞,那么这种显著关联就不会建立。讨论了有效样本量方法与其他方法(广义估计方程、相关矩阵特征值的方差和基因组控制)之间的关系。