Misman Muhammad Faiz, Mohamad Mohd Saberi, Deris Safaai, Hashim Siti Zaiton Mohd
Int J Data Min Bioinform. 2014;10(2):146-61. doi: 10.1504/ijdmb.2014.064013.
The pathway-based microarray classification approach leads to a new era of genomic research. However, this approach is limited by the issues in quality of pathway data. Usually the pathway data are curated from biological literatures and in specific biological experiment (e.g., lung cancer experiment), context free pathway information collection process takes place leading to the presence of uninformative genes in the pathways. Many methods in this approach neglect these limitations by treating all genes in a pathway as significant. In this paper, we proposed a hybrid of support vector machine and smoothly clipped absolute deviation with group-specific tuning parameters (gSVM-SCAD) to select informative genes within pathways before the pathway evaluation process. Our experiment on canine, gender and lung cancer datasets shows that gSVM-SCAD obtains significant results in identifying significant genes and pathways and in classification accuracy.
基于通路的微阵列分类方法引领了基因组研究的新时代。然而,这种方法受到通路数据质量问题的限制。通常,通路数据是从生物学文献以及特定生物学实验(例如肺癌实验)中整理而来的,在无上下文通路信息收集过程中会导致通路中存在无信息基因。该方法中的许多方法通过将通路中的所有基因都视为重要基因而忽略了这些局限性。在本文中,我们提出了一种支持向量机与具有组特异性调整参数的平滑截断绝对偏差相结合的方法(gSVM-SCAD),以便在通路评估过程之前在通路内选择信息基因。我们在犬类、性别和肺癌数据集上的实验表明,gSVM-SCAD在识别重要基因和通路以及分类准确性方面取得了显著成果。