Tan Mehmet, Polat Faruk, Alhajj Reda
Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey.
Department of Computer Engineering, Middle East Technical University, Ankara, Turkey.
Int J Data Min Bioinform. 2013;8(3):294-310. doi: 10.1504/ijdmb.2013.056080.
Classification of structured data is essential for a wide range of problems in bioinformatics and cheminformatics. One such problem is in silico prediction of small molecule properties such as toxicity, mutagenicity and activity. In this paper, we propose a new feature selection method for graph kernels that uses the subtrees of graphs as their feature sets. A masking procedure which boils down to feature selection is proposed for this purpose. Experiments conducted on several data sets as well as a comparison of our method with some frequent subgraph based approaches are presented.
结构化数据的分类对于生物信息学和化学信息学中的广泛问题至关重要。其中一个问题是小分子性质的计算机预测,如毒性、致突变性和活性。在本文中,我们提出了一种用于图核的新特征选择方法,该方法使用图的子树作为其特征集。为此提出了一种本质上归结为特征选择的掩码过程。本文展示了在几个数据集上进行的实验以及我们的方法与一些基于频繁子图的方法的比较。