Suppr超能文献

一种基于先验流形学习在微阵列数据中寻找生物学显著特征的算法。

An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.

作者信息

Hira Zena M, Trigeorgis George, Gillies Duncan F

机构信息

Department of Computing, Imperial College London, London, United Kingdom.

出版信息

PLoS One. 2014 Mar 3;9(3):e90562. doi: 10.1371/journal.pone.0090562. eCollection 2014.

Abstract

Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process--it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.

摘要

微阵列数据库是遗传数据的一个重要来源,经过适当分析,它可以增进我们对生物学和医学的理解。许多微阵列实验旨在研究癌症的遗传机制,并且已经应用了分析方法来对不同类型的癌症进行分类或区分癌组织和非癌组织。然而,微阵列是具有高噪声水平的高维数据集,这在使用机器学习方法时会产生问题。解决这个问题的一种常用方法是寻找一组能够简化结构并在一定程度上去除数据噪声的特征。最广泛使用的特征提取方法是主成分分析(PCA),它假设数据服从多元高斯模型。最近,人们对非线性方法进行了研究。其中,流形学习算法,例如等距映射(Isomap),旨在将数据从高维空间投影到低维空间。我们提出了先验流形学习方法,用于寻找一个流形,在这个流形中,一组具有代表性的微阵列数据与从KEGG通路数据库获取的相关数据相融合。一旦构建了流形,原始微阵列数据就会投影到该流形上,然后进行聚类和分类。与早期基于融合的方法不同,KEGG数据库中的先验知识并不用于分类过程,也不会使分类过程产生偏差——它仅仅作为一种辅助手段来找到搜索数据的最佳空间。在我们的实验中,我们发现使用我们新的流形方法比使用PCA或传统的Isomap能得到更好的分类结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72ad/3940899/fea95fdb94d3/pone.0090562.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验