Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India.
PLoS One. 2010 Nov 12;5(11):e13803. doi: 10.1371/journal.pone.0013803.
With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.
随着微阵列技术的进步,现在可以同时研究不同实验条件或组织样本中数千个基因的表达谱。微阵列癌症数据集以样本与基因的方式组织,用于将组织样本分类为良性和恶性或其亚型。它们还有助于识别每种癌症亚型的潜在基因标记,这有助于成功诊断特定类型的癌症。在本文中,我们提出了一种基于组织样本的多目标遗传聚类的无监督癌症分类技术。在这方面,使用了聚类中心的实码编码,并同时优化了聚类的紧凑性和分离性。所得的近 Pareto 最优解集包含了许多非支配解。提出了一种通过支持向量机(SVM)分类器结合非支配解所具有的聚类信息的新方法。最终的聚类是通过不同核函数产生的聚类之间的共识获得的。将提出的多目标聚类方法的性能与其他几种微阵列聚类算法在三个公开可用的基准癌症数据集上进行了比较。此外,还进行了统计意义检验,以确定所提出的聚类方法的统计优势。此外,还使用所提出的聚类方法产生的聚类结果识别了相关的基因标记,并进行了可视化展示。还基于基因本体研究了基因标记之间的生物学关系。所得结果很有希望,并且可能对无监督癌症分类以及多种癌症亚型的基因标记识别领域产生重要影响。
BMC Bioinformatics. 2009-1-20
IEEE J Biomed Health Inform. 2015-2-20
IEEE Trans Biomed Eng. 2012-10-18
Comput Biol Med. 2013-9-7
BMC Bioinformatics. 2007-6-16
Bioinformatics. 2003-6-12
J Theor Biol. 2016-7-7
Genome Biol. 2018-9-25
BMC Med Inform Decis Mak. 2016-7-18
IEEE J Transl Eng Health Med. 2014-12-2
Adv Drug Deliv Rev. 2015-1
BMC Bioinformatics. 2008-11-27
Bioinformatics. 2007-11-1
Bioinformatics. 2001-9