Wu Cen, Cui Yuehua
Department of Statistics and Probability, Michigan State University, 619 Red Cedar Road, Rm C432, East Lansing, MI 48824, USA. Tel.: +1-517-432-7098; Fax: +1-517-432-1405;
Brief Bioinform. 2014 Mar;15(2):279-91. doi: 10.1093/bib/bbs087. Epub 2013 Jan 15.
Set-based association studies based on genes or pathways have shown great promise in interpreting association signals associated with complex diseases. These approaches are particularly useful when variants in a set have moderate effects and are difficult to be detected with single marker analysis, especially when variants function jointly in a complicated manner. The set-based analyses use a summary statistic such as the maximum or average of individual signal (e.g. a chi-square statistic) over all variants in a set, or consider their joint distribution to assess the significance of the set. The signal obtained with this treatment, however, could be potentially diluted when noisy variants are not taken good care of, leading to either inflated false negatives or false positives. Thus, the selection of disease informative single-nucleotide polymorphism (diSNPs) plays a crucial role in improving the power of the set-based association study. In this work, we propose an efficient diSNP selection method based on the information theory. We select diSNP variants by considering their relative information contribution to a disease status, which is different from the usual tag SNP selection. The relative merit of pre-selecting diSNPs in a set-based association analysis is demonstrated through extensive simulation studies and real data analysis.
基于基因或通路的集合关联研究在解释与复杂疾病相关的关联信号方面显示出巨大潜力。当集合中的变异具有中等效应且难以通过单标记分析检测到时,这些方法特别有用,尤其是当变异以复杂的方式共同发挥作用时。基于集合的分析使用诸如集合中所有变异的单个信号(例如卡方统计量)的最大值或平均值之类的汇总统计量,或者考虑它们的联合分布来评估集合的显著性。然而,当没有妥善处理有噪声的变异时,通过这种处理获得的信号可能会被潜在稀释,导致假阴性或假阳性的膨胀。因此,疾病信息性单核苷酸多态性(diSNP)的选择在提高基于集合的关联研究的效能方面起着关键作用。在这项工作中,我们提出了一种基于信息论的有效diSNP选择方法。我们通过考虑它们对疾病状态的相对信息贡献来选择diSNP变异,这与通常的标签SNP选择不同。通过广泛的模拟研究和实际数据分析,证明了在基于集合的关联分析中预先选择diSNP的相对优点。