Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan.
PLoS One. 2012;7(5):e37018. doi: 10.1371/journal.pone.0037018. Epub 2012 May 18.
Possible single nucleotide polymorphism (SNP) interactions in breast cancer are usually not investigated in genome-wide association studies. Previously, we proposed a particle swarm optimization (PSO) method to compute these kinds of SNP interactions. However, this PSO does not guarantee to find the best result in every implement, especially when high-dimensional data is investigated for SNP-SNP interactions.
METHODOLOGY/PRINCIPAL FINDINGS: In this study, we propose IPSO algorithm to improve the reliability of PSO for the identification of the best protective SNP barcodes (SNP combinations and genotypes with maximum difference between cases and controls) associated with breast cancer. SNP barcodes containing different numbers of SNPs were computed. The top five SNP barcode results are retained for computing the next SNP barcode with a one-SNP-increase for each processing step. Based on the simulated data for 23 SNPs of six steroid hormone metabolisms and signalling-related genes, the performance of our proposed IPSO algorithm is evaluated. Among 23 SNPs, 13 SNPs displayed significant odds ratio (OR) values (1.268 to 0.848; p<0.05) for breast cancer. Based on IPSO algorithm, the jointed effect in terms of SNP barcodes with two to seven SNPs show significantly decreasing OR values (0.84 to 0.57; p<0.05 to 0.001). Using PSO algorithm, two to four SNPs show significantly decreasing OR values (0.84 to 0.77; p<0.05 to 0.001). Based on the results of 20 simulations, medians of the maximum differences for each SNP barcode generated by IPSO are higher than by PSO. The interquartile ranges of the boxplot, as well as the upper and lower hinges for each n-SNP barcode (n = 3∼10) are more narrow in IPSO than in PSO, suggesting that IPSO is highly reliable for SNP barcode identification.
CONCLUSIONS/SIGNIFICANCE: Overall, the proposed IPSO algorithm is robust to provide exact identification of the best protective SNP barcodes for breast cancer.
全基因组关联研究通常不研究乳腺癌中可能的单核苷酸多态性 (SNP) 相互作用。之前,我们提出了一种粒子群优化 (PSO) 方法来计算这些 SNP 相互作用。然而,这种 PSO 并不能保证在每次实现中都能找到最佳结果,特别是在研究 SNP-SNP 相互作用的高维数据时。
方法/主要发现:在这项研究中,我们提出了 IPSO 算法来提高 PSO 识别与乳腺癌相关的最佳保护 SNP 条码(病例和对照组之间差异最大的 SNP 组合和基因型)的可靠性。计算了包含不同数量 SNP 的 SNP 条码。保留前五个 SNP 条码结果,对于每个处理步骤,每个 SNP 条码增加一个 SNP 进行下一个 SNP 条码的计算。基于 23 个类固醇激素代谢和信号相关基因的 6 个 SNP 的模拟数据,评估了我们提出的 IPSO 算法的性能。在 23 个 SNP 中,有 13 个 SNP 的乳腺癌比值比 (OR) 值具有显著差异(1.268 至 0.848;p<0.05)。基于 IPSO 算法,具有 2 至 7 个 SNP 的 SNP 条码联合效应的 OR 值显著降低(0.84 至 0.57;p<0.05 至 0.001)。使用 PSO 算法,2 至 4 个 SNP 的 OR 值显著降低(0.84 至 0.77;p<0.05 至 0.001)。基于 20 次模拟的结果,IPS 生成的每个 SNP 条码的最大差异中位数高于 PSO。IPS 中每个 n-SNP 条码(n=3∼10)的箱线图的四分位距以及上下铰链更窄,表明 IPSO 非常可靠 SNP 条码识别。
结论/意义:总的来说,所提出的 IPSO 算法稳健,可以准确识别乳腺癌的最佳保护 SNP 条码。