Miñarro-Giménez Jose Antonio, Kreuzthaler Markus, Bernhardt-Melischnig Johannes, Martínez-Costa Catalina, Schulz Stefan
Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.
Stud Health Technol Inform. 2015;216:716-20.
The massive accumulation of biomedical knowledge is reflected by the growth of the literature database MEDLINE with over 23 million bibliographic records. All records are manually indexed by MeSH descriptors, many of them refined by MeSH subheadings. We use subheading information to cluster types of MeSH descriptor co-occurrences in MEDLINE by processing co-occurrence information provided by the UMLS. The goal is to infer plausible predicates to each resulting cluster. In an initial experiment this was done by grouping disease-pharmacologic substance co-occurrences into six clusters. Then, a domain expert manually performed the assignment of meaningful predicates to the clusters. The mean accuracy of the best ten generated biomedical facts of each cluster was 85%. This result supports the evidence of the potential of MeSH subheadings for extracting plausible medical predications from MEDLINE.
生物医学知识的大量积累体现在拥有超过2300万条书目记录的文献数据库MEDLINE的增长上。所有记录都由医学主题词(MeSH)描述符手动索引,其中许多还用MeSH副标题进行了细化。我们通过处理统一医学语言系统(UMLS)提供的共现信息,使用副标题信息对MEDLINE中MeSH描述符共现的类型进行聚类。目标是为每个结果聚类推断出合理的谓词。在最初的实验中,通过将疾病 - 药理物质共现分为六个聚类来完成这一操作。然后,领域专家手动为这些聚类分配有意义的谓词。每个聚类生成的最佳十条生物医学事实的平均准确率为85%。这一结果支持了MeSH副标题在从MEDLINE中提取合理医学预测方面具有潜力的证据。