Department of Health Sciences, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America.
PLoS One. 2010 Oct 7;5(10):e12983. doi: 10.1371/journal.pone.0012983.
Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine (http://figuresearch.askhermes.org). Existing research in figure search treats each figure equally, but we introduce a novel concept of "figure ranking": figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery.
METHODOLOGY/FINDINGS: We empirically validated the hypothesis of figure ranking with over 100 bioscience researchers, and then developed unsupervised natural language processing (NLP) approaches to automatically rank figures. Evaluating on a collection of 202 full-text articles in which authors have ranked the figures based on importance, our best system achieved a weighted error rate of 0.2, which is significantly better than several other baseline systems we explored. We further explored a user interfacing application in which we built novel user interfaces (UIs) incorporating figure ranking, allowing bioscience researchers to efficiently access important figures. Our evaluation results show that 92% of the bioscience researchers prefer as the top two choices the user interfaces in which the most important figures are enlarged. With our automatic figure ranking NLP system, bioscience researchers preferred the UIs in which the most important figures were predicted by our NLP system than the UIs in which the most important figures were randomly assigned. In addition, our results show that there was no statistical difference in bioscience researchers' preference in the UIs generated by automatic figure ranking and UIs by human ranking annotation.
CONCLUSION/SIGNIFICANCE: The evaluation results conclude that automatic figure ranking and user interfacing as we reported in this study can be fully implemented in online publishing. The novel user interface integrated with the automatic figure ranking system provides a more efficient and robust way to access scientific information in the biomedical domain, which will further enhance our existing figure search engine to better facilitate accessing figures of interest for bioscientists.
图是重要的实验结果,通常在全文生物科学文章中完整报告。生物科学研究人员需要访问这些图来验证研究事实,并制定或测试新的研究假设。另一方面,生物科学文献的数量庞大,使得访问这些图变得困难。因此,我们正在开发一个智能图搜索引擎(http://figuresearch.askhermes.org)。现有的图搜索研究对每个图都一视同仁,但我们引入了一个新的概念,即“图排名”:可以根据对知识发现的贡献对全文生物医学文章中的图进行排名。
方法/发现:我们通过 100 多名生物科学研究人员对图排名的假设进行了实证验证,然后开发了无监督的自然语言处理(NLP)方法来自动对图进行排名。在一个包含 202 篇全文文章的集合上进行评估,作者根据重要性对这些文章中的图进行了排名,我们的最佳系统达到了 0.2 的加权错误率,明显优于我们探索的几个其他基线系统。我们进一步探索了一种用户界面应用程序,其中我们构建了新的用户界面(UI),结合了图排名,使生物科学研究人员能够高效地访问重要的图。我们的评估结果表明,92%的生物科学研究人员更喜欢将最重要的图放大的前两个用户界面选项。在我们的自动图排名 NLP 系统中,生物科学研究人员更喜欢由我们的 NLP 系统预测的最重要的图的 UI,而不是由随机分配的最重要的图的 UI。此外,我们的结果表明,在自动图排名生成的 UI 和由人工排名注释生成的 UI 中,生物科学研究人员的偏好没有统计学差异。
结论/意义:评估结果得出结论,我们在本研究中报告的自动图排名和用户界面可以完全在在线出版中实现。与自动图排名系统集成的新颖用户界面提供了一种更高效、更强大的方式来访问生物医学领域的科学信息,这将进一步增强我们现有的图搜索引擎,以更好地帮助生物科学家访问感兴趣的图。