基于 Pareto 最优解的 SVM 集成算法进行癌症亚型的多类聚类以识别基因标志物。

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

机构信息

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India.

出版信息

PLoS One. 2010 Nov 12;5(11):e13803. doi: 10.1371/journal.pone.0013803.

DOI:10.1371/journal.pone.0013803

PMID:21103052

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2980474/

Abstract

With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.

摘要

随着微阵列技术的进步，现在可以同时研究不同实验条件或组织样本中数千个基因的表达谱。微阵列癌症数据集以样本与基因的方式组织，用于将组织样本分类为良性和恶性或其亚型。它们还有助于识别每种癌症亚型的潜在基因标记，这有助于成功诊断特定类型的癌症。在本文中，我们提出了一种基于组织样本的多目标遗传聚类的无监督癌症分类技术。在这方面，使用了聚类中心的实码编码，并同时优化了聚类的紧凑性和分离性。所得的近 Pareto 最优解集包含了许多非支配解。提出了一种通过支持向量机（SVM）分类器结合非支配解所具有的聚类信息的新方法。最终的聚类是通过不同核函数产生的聚类之间的共识获得的。将提出的多目标聚类方法的性能与其他几种微阵列聚类算法在三个公开可用的基准癌症数据集上进行了比较。此外，还进行了统计意义检验，以确定所提出的聚类方法的统计优势。此外，还使用所提出的聚类方法产生的聚类结果识别了相关的基因标记，并进行了可视化展示。还基于基因本体研究了基因标记之间的生物学关系。所得结果很有希望，并且可能对无监督癌症分类以及多种癌症亚型的基因标记识别领域产生重要影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb59/2980474/3ec84b7256b8/pone.0013803.g001.jpg

相似文献

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

PLoS One. 2010 Nov 12;5(11):e13803. doi: 10.1371/journal.pone.0013803.

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

BMC Bioinformatics. 2009 Jan 20;10:27. doi: 10.1186/1471-2105-10-27.

Multiobjective Simulated Annealing-Based Clustering of Tissue Samples for Cancer Diagnosis.

IEEE J Biomed Health Inform. 2016 Mar;20(2):691-8. doi: 10.1109/JBHI.2015.2404971. Epub 2015 Feb 20.

Gene expression data analysis using multiobjective clustering improved with SVM based ensemble.

In Silico Biol. 2011;11(1-2):19-27. doi: 10.3233/ISB-2012-0441.

Gene-expression-based cancer subtypes prediction through feature selection and transductive SVM.

IEEE Trans Biomed Eng. 2013 Apr;60(4):1111-7. doi: 10.1109/TBME.2012.2225622. Epub 2012 Oct 18.

Reliable classification of two-class cancer data using evolutionary algorithms.

Biosystems. 2003 Nov;72(1-2):111-29. doi: 10.1016/s0303-2647(03)00138-2.

Gene expression data clustering using a multiobjective symmetry based clustering technique.

Comput Biol Med. 2013 Nov;43(11):1965-77. doi: 10.1016/j.compbiomed.2013.07.021. Epub 2013 Sep 7.

BMC Bioinformatics. 2007 Jun 16;8:206. doi: 10.1186/1471-2105-8-206.

Simultaneous gene clustering and subset selection for sample classification via MDL.

Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.

A centroid-based gene selection method for microarray data classification.

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

引用本文的文献

Tumor classification and biomarker discovery based on the 5'isomiR expression level.

BMC Cancer. 2019 Feb 7;19(1):127. doi: 10.1186/s12885-019-5340-y.

Continuity of transcriptomes among colorectal cancer subtypes based on meta-analysis.

Genome Biol. 2018 Sep 25;19(1):142. doi: 10.1186/s13059-018-1511-4.

In-silico interaction-resolution pathway activity quantification and application to identifying cancer subtypes.

BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):55. doi: 10.1186/s12911-016-0295-2.

Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

IEEE J Transl Eng Health Med. 2014 Dec 2;2:4300211. doi: 10.1109/JTEHM.2014.2375820. eCollection 2014.

Contribution of bioinformatics prediction in microRNA-based cancer therapeutics.

Adv Drug Deliv Rev. 2015 Jan;81:94-103. doi: 10.1016/j.addr.2014.10.030. Epub 2014 Nov 6.

A novel biclustering approach to association rule mining for predicting HIV-1-human protein interactions.

PLoS One. 2012;7(4):e32289. doi: 10.1371/journal.pone.0032289. Epub 2012 Apr 23.

本文引用的文献

ASB9 interacts with ubiquitous mitochondrial creatine kinase and inhibits mitochondrial function.

BMC Biol. 2010 Mar 19;8:23. doi: 10.1186/1741-7007-8-23.

Clustering cancer gene expression data: a comparative study.

BMC Bioinformatics. 2008 Nov 27;9:497. doi: 10.1186/1471-2105-9-497.

Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues.

Genome Biol. 2008;9(9):R139. doi: 10.1186/gb-2008-9-9-r139. Epub 2008 Sep 19.

An improved algorithm for clustering gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

Expression profiling of t(12;22) positive clear cell sarcoma of soft tissue cell lines reveals characteristic up-regulation of potential new marker genes including ERBB3.

Cancer Res. 2004 May 15;64(10):3395-405. doi: 10.1158/0008-5472.CAN-03-0809.

Multiclass classification of microarray data with repeated measurements: application to cancer.

Genome Biol. 2003;4(12):R83. doi: 10.1186/gb-2003-4-12-r83. Epub 2003 Nov 24.

Reliable classification of two-class cancer data using evolutionary algorithms.

Biosystems. 2003 Nov;72(1-2):111-29. doi: 10.1016/s0303-2647(03)00138-2.

SPARC is a key Schwannian-derived inhibitor controlling neuroblastoma tumor angiogenesis.

Cancer Res. 2002 Dec 15;62(24):7357-63.

Principal component analysis for clustering gene expression data.

Bioinformatics. 2001 Sep;17(9):763-74. doi: 10.1093/bioinformatics/17.9.763.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.

Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 Pareto 最优解的 SVM 集成算法进行癌症亚型的多类聚类以识别基因标志物。

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献