Eisinger Daniel, Tsatsaronis George, Bundschus Markus, Wieneke Ulrich, Schroeder Michael
TU Dresden, BIOTEC, Tatzberg 47/49, 01307 Dresden, Germany.
J Biomed Semantics. 2013 Apr 15;4 Suppl 1(Suppl 1):S3. doi: 10.1186/2041-1480-4-S1-S3.
Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Patent documents are another important information source, though they are considerably less accessible. One option to expand patent search beyond pure keywords is the inclusion of classification information: Since every patent is assigned at least one class code, it should be possible for these assignments to be automatically used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. This report describes our comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms.Our analysis shows a strong structural similarity of the hierarchies, but significant differences of terms and annotations. The low number of IPC class assignments and the lack of occurrences of class labels in patent texts imply that current patent search is severely limited. To overcome these limits, we evaluate a method for the automated assignment of additional classes to patent documents, and we propose a system for guided patent search based on the use of class co-occurrence information and external resources.
在生物医学文献的卓越数据库PubMed上进行文献检索,依赖于用医学主题词表(MeSH)中的相关术语对文献进行标注,以通过查询扩展提高召回率。专利文献是另一个重要的信息来源,尽管获取难度要大得多。扩展专利检索范围使其超出纯关键词检索的一种方法是纳入分类信息:由于每项专利至少被分配一个分类代码,这些分类代码应该有可能以与PubMed中MeSH标注类似的方式被自动使用。为了开发用于此任务的系统,有必要深入了解这两种分类系统的特性。本报告描述了我们对MeSH和主要专利分类系统——国际专利分类(IPC)的比较分析。我们分别研究了层次结构以及术语/类别的特性,并将IPC代码对专利的分配与PubMed文献用MeSH术语进行的标注进行了比较。我们的分析表明,两者层次结构有很强的结构相似性,但术语和标注存在显著差异。IPC分类代码分配数量较少以及专利文本中类别标签出现次数不足,这意味着当前的专利检索受到严重限制。为克服这些限制,我们评估了一种为专利文献自动分配额外类别的方法,并提出了一种基于类别共现信息和外部资源使用的引导式专利检索系统。