Fang Shenying, Fang Xiangzhong, Xiong Momiao
Department of Epidemiology, The University of Texas M D Anderson Cancer Center, Houston, Texas 77030, USA.
BMC Dermatol. 2011 Jan 7;11:1. doi: 10.1186/1471-5945-11-1.
With the availability of large-scale genome-wide association study (GWAS) data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs) to predict psoriasis from searching GWAS data.
Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB) method was compared with classical linear discriminant analysis(LDA) for classification performance.
The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698), while only 0.520(95% CI: 0.472-0.524) was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study.
The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.
随着大规模全基因组关联研究(GWAS)数据的可得,选择一组最优的单核苷酸多态性(SNP)用于疾病易感性预测是一项具有挑战性的任务。本研究旨在通过搜索GWAS数据,利用单核苷酸多态性(SNP)来预测银屑病。
我们共有2798个样本和451724个SNP。搜索一组SNP以预测银屑病易感性的过程包括两个步骤。第一步是从GWAS数据集中搜索准确率高的前1000个SNP用于银屑病预测。第二步是搜索用于预测银屑病的最优SNP子集。将序列信息瓶颈(sIB)方法与经典线性判别分析(LDA)的分类性能进行比较。
通过sIB预测银屑病的敏感性和特异性的最佳测试调和均值为0.674(95%可信区间:0.650 - 0.698),而通过LDA预测疾病的调和均值仅为0.520(95%可信区间:0.472 - 0.524)。我们的结果表明,在本研究中,新的分类器sIB比LDA表现更好。
一小部分SNP能够以68%的平均准确率预测疾病状态,这一事实使得利用SNP数据进行银屑病预测成为可能。