Liu Rey-Long
Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan, R. O. C.
PLoS One. 2015 Oct 6;10(10):e0139245. doi: 10.1371/journal.pone.0139245. eCollection 2015.
Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling) that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references) in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no) articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations.
生物医学文献是生物医学证据的重要来源。为了翻译生物医学研究的证据,研究人员通常需要仔细阅读多篇关于特定生物医学问题的文章。因此,这些文章需要高度相关。它们应该共享相似的核心内容,包括研究目标、方法和发现。然而,对于给定的一篇文章r,搜索引擎很难为r检索到高度相关的文章。在本文中,我们提出了一种技术PBC(基于段落的文献耦合),该技术通过将文献耦合与从每篇文章中重要的外部链接引用(参考文献)周围的上下文段落中收集的信息无缝集成,来估计文章间的相似度。实证评估表明,PBC可以显著改善生物医学专家认为与特定基因-疾病关联文章高度相关的那些文章的检索。因此,即使文章r被很少(甚至没有)文章引用,PBC也可用于改进搜索引擎,以检索与任何给定文章r高度相关的文章。这一贡献对于那些旨在交叉验证特定基因-疾病关联证据的研究人员和文本挖掘系统至关重要。