Sachnev Vasily, Saraswathi Saras, Niaz Rashid, Kloczkowski Andrzej, Suresh Sundaram
Department of Information, Communication and Electronics Engineering, Catholic University of Korea, Bucheon, Republic of Korea.
Battelle Center for Mathematical Medicine at The Research Institute at Nationwide Children's Hospital; currently at Sidra, Medical and Research Center, Doha, Qatar.
BMC Bioinformatics. 2015 May 20;16:166. doi: 10.1186/s12859-015-0565-5.
Traditional cancer treatments have centered on cytotoxic drugs and general purpose chemotherapy that may not be tailored to treat specific cancers. Identification of molecular markers that are related to different types of cancers might lead to discovery of drugs that are patient and disease specific. This study aims to use microarray gene expression cancer data to identify biomarkers that are indicative of different types of cancers. Our aim is to provide a multi-class cancer classifier that can simultaneously differentiate between cancers and identify type-specific biomarkers, through the application of the Binary Coded Genetic Algorithm (BCGA) and a neural network based Extreme Learning Machine (ELM) algorithm.
BCGA and ELM are combined and used to select a subset of genes that are present in the Global Cancer Mapping (GCM) data set. This set of candidate genes contains over 52 biomarkers that are related to multiple cancers, according to the literature. They include APOA1, VEGFC, YWHAZ, B2M, EIF2S1, CCR9 and many other genes that have been associated with the hallmarks of cancer. BCGA-ELM is tested on several cancer data sets and the results are compared to other classification methods. BCGA-ELM compares or exceeds other algorithms in terms of accuracy. We were also able to show that over 50% of genes selected by BCGA-ELM on GCM data are cancer related biomarkers.
We were able to simultaneously differentiate between 14 different types of cancers, using only 92 genes, to achieve a multi-class classification accuracy of 95.4% which is between 21.6% and 38% higher than other results in the literature for multi-class cancer classification. Our findings suggest that computational algorithms such as BCGA-ELM can facilitate biomarker-driven integrated cancer research that can lead to a detailed understanding of the complexities of cancer.
传统的癌症治疗主要集中在细胞毒性药物和通用化疗上,这些治疗可能无法针对特定癌症进行定制。识别与不同类型癌症相关的分子标记物可能会促使发现针对患者和疾病的特异性药物。本研究旨在利用微阵列基因表达癌症数据来识别指示不同类型癌症的生物标志物。我们的目标是通过应用二进制编码遗传算法(BCGA)和基于神经网络的极限学习机(ELM)算法,提供一种多类癌症分类器,该分类器能够同时区分不同癌症并识别类型特异性生物标志物。
将BCGA和ELM相结合,用于选择全球癌症图谱(GCM)数据集中存在的基因子集。根据文献,这组候选基因包含52种以上与多种癌症相关的生物标志物。它们包括载脂蛋白A1(APOA1)、血管内皮生长因子C(VEGFC)、14-3-3ζ蛋白(YWHAZ)、β2微球蛋白(B2M)、真核翻译起始因子2亚基1(EIF2S1)、趋化因子受体9(CCR9)以及许多其他与癌症特征相关的基因。在多个癌症数据集上对BCGA-ELM进行了测试,并将结果与其他分类方法进行了比较。在准确性方面,BCGA-ELM与其他算法相当或更优。我们还能够证明,BCGA-ELM在GCM数据上选择的基因中,超过50%是与癌症相关的生物标志物。
我们仅使用92个基因就能同时区分14种不同类型的癌症,实现了95.4%的多类分类准确率,比文献中多类癌症分类的其他结果高21.6%至38%。我们的研究结果表明,诸如BCGA-ELM之类的计算算法可以促进生物标志物驱动的综合癌症研究,从而有助于深入了解癌症的复杂性。