He Jingwu, Zelikovsky Alexander
Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA.
Bioinformatics. 2006 Oct 15;22(20):2558-61. doi: 10.1093/bioinformatics/btl420. Epub 2006 Aug 7.
The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs accurately representing the rest of the SNPs. Informative SNP selection can achieve (1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or (2) necessary reduction of the huge SNP sets (obtained, e.g. from Affymetrix) for further fine haplotype analysis. A novel informative SNP selection method for unphased genotype data based on multiple linear regression (MLR) is implemented in the software package MLR-tagging. This software can be used for informative SNP (tag) selection and genotype prediction. The stepwise tag selection algorithm (STSA) selects positions of the given number of informative SNPs based on a genotype sample population. The MLR SNP prediction algorithm predicts a complete genotype based on the values of its informative SNPs, their positions among all SNPs, and a sample of complete genotypes. An extensive experimental study on various datasets including 10 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin et al. (2005).
MLR-Tagging software package is publicly available at http://alla.cs.gsu.edu/~software/tagging/tagging.html
寻找复杂疾病与单核苷酸多态性(SNP)或单倍型之间的关联最近受到了极大关注。对于这些研究,准确使用一小部分能代表其余SNP的信息性SNP至关重要。信息性SNP选择可以实现:(1)通过仅对有限数量的SNP进行基因分型并通过计算推断所有其他SNP,从而大幅节省预算;或者(2)对庞大的SNP集(例如从Affymetrix获得的)进行必要的缩减,以便进行进一步的精细单倍型分析。基于多元线性回归(MLR)的一种用于未分型基因型数据的新型信息性SNP选择方法在软件包MLR-tagging中得以实现。该软件可用于信息性SNP(标签)选择和基因型预测。逐步标签选择算法(STSA)基于基因型样本群体选择给定数量的信息性SNP的位置。MLR SNP预测算法根据其信息性SNP的值、它们在所有SNP中的位置以及完整基因型样本预测完整基因型。对包括来自HapMap的10个区域在内的各种数据集进行的广泛实验研究表明,与Halperin等人(2005年)的最先进方法相比,MLR预测与逐步标签选择相结合使用的标签更少。
MLR-Tagging软件包可在http://alla.cs.gsu.edu/~software/tagging/tagging.html上公开获取。