Bleik Said, Mishra Meenakshi, Huan Jun, Song Min
New Jersey Institute of Technology, Newark.
University of Kansas, Lawrence.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1211-7. doi: 10.1109/TCBB.2013.16.
Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier's performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.
最近,在文本分类应用中,文本的图表示相对于传统的词袋表示已显示出更好的性能。在本文中,我们提出了一种用于生物医学文章的基于图的表示方法,并使用图核将这些文章分类为高级类别。在我们的表示方法中,借助现有的本体识别常见的生物医学概念和语义关系,并用于构建丰富的图结构,该结构提供一致的特征集并保留可提高分类器性能的额外语义信息。我们尝试使用能够处理图的不连通性质的基于集合的图核和简单的线性核来对图进行分类。最后,我们报告了将核分类器的分类性能与常见的基于文本的分类器进行比较的结果。