Bioinformatics and Biostatistics, Neogen GeneSeek, Lincoln, NE, 68504, USA.
Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA.
Anim Genet. 2020 Mar;51(2):306-310. doi: 10.1111/age.12916. Epub 2020 Jan 31.
Over the years, ad-hoc procedures were used for designing SNP arrays, but the procedures and strategies varied considerably case by case. Recently, a multiple-objective, local optimization (MOLO) algorithm was proposed to select SNPs for SNP arrays, which maximizes the adjusted SNP information (E score) under multiple constraints, e.g. on MAF, uniformness of SNP locations (U score), the inclusion of obligatory SNPs and the number and size of gaps. In the MOLO, each chromosome is split into equally spaced segments and local optima are selected as the SNPs having the highest adjusted E score within each segment, conditional on the presence of obligatory SNPs. The computation of the adjusted E score, however, is empirical, and it does not scale well between the uniformness of SNP locations and SNP informativeness. In addition, the MOLO objective function does not accommodate the selection of uniformly distributed SNPs. In the present study, we proposed a unified local function for optimally selecting SNPs, as an amendment to the MOLO algorithm. This new local function takes scalable weights between the uniformness and informativeness of SNPs, which allows the selection of SNPs under varied scenarios. The results showed that the weighting between the U and the E scores led to a higher imputation concordance rate than the U score or E score alone. The results from the evaluation of six commercial bovine SNP chips further confirmed this conclusion.
多年来,人们采用特定的方法来设计 SNP 芯片,但具体方法和策略因情况而异。最近,提出了一种多目标、局部优化(MOLO)算法来选择 SNP 芯片中的 SNP,该算法在多个约束条件下最大化调整后的 SNP 信息(E 评分),例如在 MAF、SNP 位置均匀性(U 评分)、强制性 SNP 的包含以及间隙的数量和大小方面。在 MOLO 中,每条染色体被分成等间隔的片段,并且选择局部最优的 SNP 作为在每个片段中具有最高调整 E 评分的 SNP,条件是存在强制性 SNP。然而,调整后的 E 评分的计算是经验性的,并且在 SNP 位置均匀性和 SNP 信息量之间的比例并不很好。此外,MOLO 目标函数不适应均匀分布的 SNP 的选择。在本研究中,我们提出了一种统一的局部函数,用于最优地选择 SNP,作为 MOLO 算法的修正案。这个新的局部函数采用 SNP 均匀性和信息量之间的可扩展权重,这允许在不同情况下选择 SNP。结果表明,U 评分和 E 评分之间的权重比单独的 U 评分或 E 评分导致更高的插补一致性率。来自对六个商业牛 SNP 芯片的评估结果进一步证实了这一结论。