Swarnkar Tripti, Mitra Pabitra
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721 302, India,
J Biosci. 2015 Oct;40(4):755-67. doi: 10.1007/s12038-015-9559-8.
A challenge in bioinformatics is to analyse volumes of gene expression data generated through microarray experiments and obtain useful information. Consequently, most microarray studies demand complex data analysis to infer biologically meaningful information from such high-throughput data. Selection of informative genes is an important data analysis step to identify a set of genes which can further help in finding the biological information embedded in microarray data, and thus assists in diagnosis, prognosis and treatment of the disease. In this article we present an unsupervised feature selection technique which attempts to address the goal of explorative data analysis, unfolding the multi-faceted nature of data. It focuses on extracting multiple clustering views considering the diversity of each view from high-dimensional data. We evaluated our technique on benchmark data sets and the experimental results indicates the potential and effectiveness of the proposed model in comparison to the traditional single view clustering models, as well as other existing methods used in the literature for the studied datasets.
生物信息学中的一个挑战是分析通过微阵列实验生成的大量基因表达数据,并获取有用信息。因此,大多数微阵列研究需要复杂的数据分析,以便从这类高通量数据中推断出具有生物学意义的信息。选择信息丰富的基因是一个重要的数据分析步骤,目的是识别出一组基因,这些基因能够进一步帮助发现微阵列数据中蕴含的生物学信息,从而有助于疾病的诊断、预后和治疗。在本文中,我们提出了一种无监督特征选择技术,该技术试图实现探索性数据分析的目标,展现数据的多面性。它专注于从高维数据中提取多个聚类视图,同时考虑每个视图的多样性。我们在基准数据集上评估了我们的技术,实验结果表明,与传统的单视图聚类模型以及文献中用于所研究数据集的其他现有方法相比,所提出模型具有潜力和有效性。