Shah Shital C, Kusiak Andrew
Intelligent Systems Laboratory, MIE, 2139 Seamans Center, The University of Iowa, Iowa City, IA 52242-1527, USA.
Artif Intell Med. 2004 Jul;31(3):183-96. doi: 10.1016/j.artmed.2004.04.002.
Genomic studies provide large volumes of data with the number of single nucleotide polymorphisms (SNPs) ranging into thousands. The analysis of SNPs permits determining relationships between genotypic and phenotypic information as well as the identification of SNPs related to a disease. The growing wealth of information and advances in biology call for the development of approaches for discovery of new knowledge. One such area is the identification of gene/SNP patterns impacting cure/drug development for various diseases.
A new approach for predicting drug effectiveness is presented. The approach is based on data mining and genetic algorithms. A global search mechanism, weighted decision tree, decision-tree-based wrapper, a correlation-based heuristic, and the identification of intersecting feature sets are employed for selecting significant genes.
The feature selection approach has resulted in 85% reduction of number of features. The relative increase in cross-validation accuracy and specificity for the significant gene/SNP set was 10% and 3.2%, respectively.
The feature selection approach was successfully applied to data sets for drug and placebo subjects. The number of features has been significantly reduced while the quality of knowledge was enhanced. The feature set intersection approach provided the most significant genes/SNPs. The results reported in the paper discuss associations among SNPs resulting in patient-specific treatment protocols.
基因组研究提供了大量数据,单核苷酸多态性(SNP)的数量达数千个。对SNP的分析有助于确定基因型与表型信息之间的关系,并识别与疾病相关的SNP。信息量的不断增加以及生物学的发展促使人们开发新知识发现方法。其中一个领域是识别影响各种疾病治疗/药物开发的基因/SNP模式。
提出了一种预测药物疗效的新方法。该方法基于数据挖掘和遗传算法。采用全局搜索机制、加权决策树、基于决策树的包装器、基于相关性的启发式方法以及交叉特征集识别来选择重要基因。
特征选择方法使特征数量减少了85%。重要基因/SNP集的交叉验证准确性和特异性相对提高分别为10%和3.2%。
特征选择方法成功应用于药物和安慰剂受试者的数据集。特征数量显著减少,同时知识质量得到提高。特征集交叉方法提供了最重要的基因/SNP。本文报道的结果讨论了导致患者特异性治疗方案的SNP之间的关联。