Suppr超能文献

基于基因网络模块和支持向量机的分类和生物标志物识别。

Classification and biomarker identification using gene network modules and support vector machines.

机构信息

The Institute of Applied Research-The Galilee Society, Shefa-Amr, Israel.

出版信息

BMC Bioinformatics. 2009 Oct 15;10:337. doi: 10.1186/1471-2105-10-337.

Abstract

BACKGROUND

Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE) RESULTS: Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form n clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained.

CONCLUSION

More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.The Matlab version of SVM-RNE can be downloaded from http://web.macam.ac.il/~myousef.

摘要

背景

使用微阵列数据集进行分类通常基于少数样本,这些样本获得了数万个基因表达测量值。在高维数据分析和解释中,选择对分类问题最有意义的基因是一个具有挑战性的问题。先前使用 SVM-RCE(递归聚类消除)的研究表明,基于相关基因组的分类有时比使用单个基因的分类表现更好。基因相互作用网络的大型数据库为分析遗传现象以及使用相互作用基因进行分类研究提供了重要资源。现在,我们证明了一种将网络信息与基于 SVM 的递归特征消除相结合的算法具有良好的性能,并提高了结果的生物学可解释性。我们将该方法称为基于递归网络消除的 SVM(SVM-RNE)。结果:最初,从训练集中通过 t 检验选择一千个基因,以便仅保留映射到基因网络数据库的基因。应用基因表达网络分析工具(GXNA)对剩余基因进行处理,形成 n 个在网络中高度连接的基因簇。使用线性 SVM 使用这些簇对样本进行分类,并根据它们对分类的重要性为每个簇分配权重。去除信息量最小的簇,同时保留其余簇以供下一次分类步骤使用。此过程重复进行,直到获得最佳分类。结论:通过将交互网络信息与微阵列中的基因表达信息相结合,对选定的微阵列数据集进行分类可以获得超过 90%的准确率。SVM-RNE 的 Matlab 版本可从 http://web.macam.ac.il/~myousef 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45cb/2774324/4f58e09dab9b/1471-2105-10-337-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验