Wang Wei-Bung, Jiang Tao
Department of Computer Science, University of California, Riverside, CA, USA.
Genome Inform. 2008;21:27-41.
Tag SNP selection is an important problem in computational biology and genetics because a small set of tag SNP markers may help reduce the cost of genotyping and thus genome-wide association studies. Several methods for selecting a smallest possible set of tag SNPs based on different formulations of tag SNP selection (block-based or genome-wide) and mathematical models of marker correlation have been investigated in the literature. In this paper, we propose a new model of multi-marker correlation for genome-wide tag SNP selection, and a simple greedy algorithm to select a smallest possible set of tag SNPs according to the model. Our experimental results on several real datasets from the HapMap project demonstrate that the new model yields more succinct tag SNP sets than the previous methods.
标签单核苷酸多态性(Tag SNP)选择是计算生物学和遗传学中的一个重要问题,因为一小部分标签SNP标记可能有助于降低基因分型成本,从而有助于全基因组关联研究。文献中已经研究了几种基于标签SNP选择的不同形式(基于块或全基因组)以及标记相关性的数学模型来选择尽可能小的标签SNP集的方法。在本文中,我们提出了一种用于全基因组标签SNP选择的多标记相关性新模型,以及一种简单的贪心算法,用于根据该模型选择尽可能小的标签SNP集。我们在来自HapMap项目的几个真实数据集上的实验结果表明,新模型产生的标签SNP集比以前的方法更简洁。