Faculty of Informatics, Universidad Politécnica de Madrid, R. 3306, Campus de Montegancedo, 28660 Boadilla del Monte, Madrid, Spain.
Artif Intell Med. 2010 Nov;50(3):193-201. doi: 10.1016/j.artmed.2010.05.010. Epub 2010 Jul 21.
This paper presents an optimization algorithm for the automatic selection of a minimal subset of tagging single nucleotide polymorphisms (SNPs).
The determination of the set of minimal tagging SNPs is approached as an optimization problem in which each tagged SNP can be covered by a single tagging SNP or by a pair of tagging SNPs. The problem is solved using an estimation of distribution algorithm (EDA) which takes advantage of the underlying topological structure defined by the SNP correlations to model the problem interactions. The EDA stochastically searches the constrained space of feasible solutions. It is evaluated across HapMap reference panel data sets.
The EDA was compared with a SAT solver, able to find the single-marker minimal tagging sets, and with the Tagger program. The percentage of reduction ranged from 10% to 43% in the number of tagging SNPs of the minimal multi-marker tagging set found by the EDA with respect to the other algorithms.
The introduced algorithm is effective for the identification of minimal multi-marker SNP sets, which considerably reduce the dimension of the tagging SNP set in comparison with single-marker sets. Other variants of the SNP problem can be treated following the same approach.
本文提出了一种用于自动选择最小标记单核苷酸多态性(SNP)子集的优化算法。
将最小标记 SNP 集的确定视为优化问题,其中每个标记 SNP 可以由单个标记 SNP 或由一对标记 SNP 覆盖。该问题通过利用 SNP 相关性定义的底层拓扑结构来解决,通过使用分布估计算法(EDA)来建模问题交互。EDA 随机搜索可行解的约束空间。它在 HapMap 参考面板数据集上进行了评估。
将 EDA 与能够找到单标记最小标记集的 SAT 求解器以及 Tagger 程序进行了比较。EDA 发现的最小多标记标记集的标记 SNP 数量相对于其他算法减少了 10%至 43%。
所提出的算法对于识别最小多标记 SNP 集是有效的,与单标记集相比,它大大降低了标记 SNP 集的维度。可以采用相同的方法处理 SNP 问题的其他变体。