Pongpanich Monnat, Neely Megan L, Tzeng Jung-Ying
Bioinformatics Research Center, North Carolina State University Raleigh, NC, USA.
Front Genet. 2012 Jan 9;2:110. doi: 10.3389/fgene.2011.00110. eCollection 2011.
Methods that collapse information across genetic markers when searching for association signals are gaining momentum in the literature. Although originally developed to achieve a better balance between retaining information and controlling degrees of freedom when performing multimarker association analysis, these methods have recently been proven to be a powerful tool for identifying rare variants that contribute to complex phenotypes. The information among markers can be collapsed at the genotype level, which focuses on the mean of genetic information, or the similarity level, which focuses on the variance of genetic information. The aim of this work is to understand the strengths and weaknesses of these two collapsing strategies. Our results show that neither collapsing strategy outperforms the other across all simulated scenarios. Two factors that dominate the performance of these strategies are the signal-to-noise ratio and the underlying genetic architecture of the causal variants. Genotype collapsing is more sensitive to the marker set being contaminated by noise loci than similarity collapsing. In addition, genotype collapsing performs best when the genetic architecture of the causal variants is not complex (e.g., causal loci with similar effects and similar frequencies). Similarity collapsing is more robust as the complexity of the genetic architecture increases and outperforms genotype collapsing when the genetic architecture of the marker set becomes more sophisticated (e.g., causal loci with various effect sizes or frequencies and potential non-linear or interactive effects). Because the underlying genetic architecture is not known a priori, we also considered a two-stage analysis that combines the two top-performing methods from different collapsing strategies. We find that it is reasonably robust across all simulated scenarios.
在寻找关联信号时,通过合并遗传标记信息的方法在文献中越来越受到关注。尽管这些方法最初是为了在进行多标记关联分析时,在保留信息和控制自由度之间取得更好的平衡而开发的,但最近已被证明是识别导致复杂表型的罕见变异的有力工具。标记之间的信息可以在基因型水平上合并,该水平关注遗传信息的均值;也可以在相似性水平上合并,该水平关注遗传信息的方差。这项工作的目的是了解这两种合并策略的优缺点。我们的结果表明,在所有模拟场景中,没有一种合并策略比另一种更优。主导这些策略性能的两个因素是信噪比和因果变异的潜在遗传结构。与相似性合并相比,基因型合并对被噪声位点污染的标记集更敏感。此外,当因果变异的遗传结构不复杂时(例如,具有相似效应和相似频率的因果位点),基因型合并表现最佳。随着遗传结构复杂性的增加,相似性合并更稳健,当标记集的遗传结构变得更复杂时(例如,具有各种效应大小或频率以及潜在非线性或交互效应的因果位点),相似性合并优于基因型合并。由于潜在的遗传结构事先未知,我们还考虑了一种两阶段分析,该分析结合了来自不同合并策略的两种表现最佳的方法。我们发现它在所有模拟场景中都相当稳健。