Suppr超能文献

混合遗传算法-神经网络:未预处理微阵列数据的特征提取。

Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.

机构信息

The John van Geest Cancer Research Centre, School of Science and Technology, Nottingham Trent University, UK.

出版信息

Artif Intell Med. 2011 Sep;53(1):47-56. doi: 10.1016/j.artmed.2011.06.008. Epub 2011 Jul 19.

Abstract

OBJECTIVE

Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data.

METHOD

The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time.

RESULTS

The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types.

CONCLUSIONS

The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality.

摘要

目的

适用于微阵列分析的技术已得到广泛研究,特别是针对特定类型癌症表达的标记基因的研究。应用于显著基因选择的大多数机器学习方法都侧重于方法的分类能力而非选择能力。这些方法还需要在进行分析之前对微阵列数据进行预处理。本研究的目的是开发一种强调特征选择且可以处理未经预处理的微阵列数据的混合遗传算法神经网络(GANN)模型。

方法

GANN 是一种混合模型,其中遗传算法(GA)的适应值基于由标准前馈人工神经网络(ANN)正确标记的样本数量。该模型通过使用具有不同阵列平台和不同类数的两个基准微阵列数据集进行评估(用于急性白血病的 2 类寡核苷酸微阵列数据和用于 SRBCT(小圆蓝细胞肿瘤)的 4 类 cDNA 微阵列数据集)。GANN 算法的基本思想是通过同时共同进化 GA 适应度函数和 ANN 权重来选择高度信息丰富的基因。

结果

新颖的 GANN 选择了与原始研究相同的约 50%的基因。这可能表明这些共同基因比数据集中的其他基因更具有生物学意义。所识别的其余 50%的重要基因用于构建预测模型,对于两个数据集,基于 GANN 方法提取的基因集构建的模型产生了更准确的结果。结果还表明,GANN 方法不仅可以检测与单一癌症类型唯一相关的基因,还可以探索在多种癌症类型中差异表达的基因。

结论

结果表明,GANN 模型已成功从未经预处理的微阵列数据中提取具有统计学意义的基因,并且还提取了已知具有生物学意义的基因。我们还表明,基于分类准确性评估基因的生物学意义可能具有误导性,尽管 GANN 提取的额外基因集比其他方法选择的基因集更具统计学意义,但强烈建议对这些基因进行生物学评估以确认其功能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验