建立和挖掘术语关联模型以提高生物医学信息检索性能。

Modeling and mining term association for improving biomedical information retrieval performance.

机构信息

Information Retrieval and Knowledge Management Research Lab, York University, Toronto, ON, M3J1P3, Canada.

出版信息

BMC Bioinformatics. 2012 Jun 11;13 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-13-S9-S2.

DOI:10.1186/1471-2105-13-S9-S2

PMID:22901087

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3372456/

Abstract

BACKGROUND

The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance.

RESULTS

We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10.

CONCLUSIONS

First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.

摘要

背景

生物医学信息的增长要求大多数信息检索系统能够针对复杂的用户查询提供简短而具体的答案。以人类易于阅读但计算机难以自动解释和高效搜索的方式结构化的自由文本形式的语义信息。其中一个原因是，大多数传统的信息检索模型假设在给定文档/段落的情况下术语是条件独立的。因此，我们有动力考虑不同上下文中的术语关联，以帮助模型理解语义信息并将其用于提高生物医学信息检索性能。

结果

我们提出了一种术语关联方法来发现查询关键字之间的术语关联。实验是在 TREC 2004-2007 基因组学数据集和 TREC 2004 HARD 数据集上进行的。该方法具有很大的应用潜力，优于基线和 GSP 结果。我们研究了参数设置和不同的索引，结果表明基于句子的索引在文档级别上产生了最佳的结果，基于单词的索引在段落级别上产生了最佳的结果，基于段落的索引在段落 2 级别上产生了最佳的结果。此外，最佳的术语关联结果总是来自最佳的基线。还讨论并局部优化了所提出的递归重新排序算法中的调整数 k，使其为 10。

结论

首先，使用因素分析来建模术语关联以改进生物医学信息检索是我们工作的主要贡献之一。其次，实验证实，考虑关键字之间的共现和依赖关系的术语关联可以产生比独立处理关键字的基线更好的结果。第三，根据术语关联背后潜在因素的重要性和依赖性，对基线进行重新排序。这些潜在因素是由所提出的模型和它们在第一轮检索段落中的术语出现决定的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6581/3372456/b48a5bf19bfe/1471-2105-13-S9-S2-1.jpg

相似文献

Modeling and mining term association for improving biomedical information retrieval performance.建立和挖掘术语关联模型以提高生物医学信息检索性能。

BMC Bioinformatics. 2012 Jun 11;13 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-13-S9-S2.

A LDA-based approach to promoting ranking diversity for genomics information retrieval.基于 LDA 的方法提高基因组信息检索的排名多样性。

BMC Genomics. 2012 Jun 11;13 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2164-13-S3-S2.

A robust approach to optimizing multi-source information for enhancing genomics retrieval performance.一种用于优化多源信息以提高基因组学检索性能的稳健方法。

BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-12-S5-S6. Epub 2011 Jul 27.

Improve Biomedical Information Retrieval Using Modified Learning to Rank Methods.采用改进的学习排序方法提高生物医学信息检索

IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1797-1809. doi: 10.1109/TCBB.2016.2578337. Epub 2016 Jun 14.

Promoting ranking diversity for genomics search with relevance-novelty combined model.用相关新颖度组合模型提升基因组学搜索的排名多样性。

BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-12-S5-S8. Epub 2011 Jul 27.

Re-ranking with context for high-performance biomedical information retrieval.结合上下文进行重排序以实现高性能生物医学信息检索。

Int J Data Min Bioinform. 2012;6(2):115-29. doi: 10.1504/ijdmb.2012.048172.

Essie: a concept-based search engine for structured biomedical text.Essie：一个用于结构化生物医学文本的基于概念的搜索引擎。

J Am Med Inform Assoc. 2007 May-Jun;14(3):253-63. doi: 10.1197/jamia.M2233. Epub 2007 Feb 28.

Exploiting the semantic graph for the representation and retrieval of medical documents.利用语义图进行医学文献的表示和检索。

Comput Biol Med. 2018 Oct 1;101:39-50. doi: 10.1016/j.compbiomed.2018.08.009. Epub 2018 Aug 7.

Learning to rank query expansion terms for COVID-19 scholarly search.学习对 COVID-19 学术搜索进行查询扩展词的排序。

J Biomed Inform. 2023 Jun;142:104386. doi: 10.1016/j.jbi.2023.104386. Epub 2023 May 12.

A dimensional retrieval model for integrating semantics and statistical evidence in context for genomics literature search.一种用于在基因组学文献搜索的上下文环境中整合语义和统计证据的维度检索模型。

Comput Biol Med. 2009 Jan;39(1):61-8. doi: 10.1016/j.compbiomed.2008.11.002. Epub 2009 Jan 15.

本文引用的文献

A robust approach to optimizing multi-source information for enhancing genomics retrieval performance.一种用于优化多源信息以提高基因组学检索性能的稳健方法。

BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-12-S5-S6. Epub 2011 Jul 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

建立和挖掘术语关联模型以提高生物医学信息检索性能。

Modeling and mining term association for improving biomedical information retrieval performance.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献