Suppr超能文献

基于语义图的生物医学文摘方法。

A semantic graph-based approach to biomedical summarisation.

机构信息

Departamento de Ingeniería del Software e Inteligencia Artificial, Universidad Complutense de Madrid, C/Profesor José García Santesmases, Spain.

出版信息

Artif Intell Med. 2011 Sep;53(1):1-14. doi: 10.1016/j.artmed.2011.06.005. Epub 2011 Jul 12.

Abstract

OBJECTIVE

Access to the vast body of research literature that is available in biomedicine and related fields may be improved by automatic summarisation. This paper presents a method for summarising biomedical scientific literature that takes into consideration the characteristics of the domain and the type of documents.

METHODS

To address the problem of identifying salient sentences in biomedical texts, concepts and relations derived from the Unified Medical Language System (UMLS) are arranged to construct a semantic graph that represents the document. A degree-based clustering algorithm is then used to identify different themes or topics within the text. Different heuristics for sentence selection, intended to generate different types of summaries, are tested. A real document case is drawn up to illustrate how the method works.

RESULTS

A large-scale evaluation is performed using the recall-oriented understudy for gisting-evaluation (ROUGE) metrics. The results are compared with those achieved by three well-known summarisers (two research prototypes and a commercial application) and two baselines. Our method significantly outperforms all summarisers and baselines. The best of our heuristics achieves an improvement in performance of almost 7.7 percentage units in the ROUGE-1 score over the LexRank summariser (0.7862 versus 0.7302). A qualitative analysis of the summaries also shows that our method succeeds in identifying sentences that cover the main topic of the document and also considers other secondary or "satellite" information that might be relevant to the user.

CONCLUSION

The method proposed is proved to be an efficient approach to biomedical literature summarisation, which confirms that the use of concepts rather than terms can be very useful in automatic summarisation, especially when dealing with highly specialised domains.

摘要

目的

通过自动摘要,可以提高对生物医学和相关领域中大量研究文献的访问。本文提出了一种考虑领域特征和文档类型的生物医学科学文献摘要方法。

方法

为了解决在生物医学文本中识别重要句子的问题,从统一医学语言系统(UMLS)中提取概念和关系,构建表示文档的语义图。然后使用基于度的聚类算法来识别文本中的不同主题或话题。测试了不同的句子选择启发式方法,旨在生成不同类型的摘要。通过实际文档案例来说明该方法的工作原理。

结果

使用召回导向的文本摘用评估(ROUGE)指标进行了大规模评估。将结果与三种知名摘要器(两个研究原型和一个商业应用程序)和两个基线进行比较。我们的方法明显优于所有的摘要器和基线。我们的最佳启发式方法在 ROUGE-1 得分上比 LexRank 摘要器提高了近 7.7 个百分点(0.7862 对 0.7302)。对摘要的定性分析也表明,我们的方法成功地识别了涵盖文档主要主题的句子,并考虑了对用户可能相关的其他次要或“卫星”信息。

结论

所提出的方法被证明是一种有效的生物医学文献摘要方法,这证实了在自动摘要中使用概念而不是术语可能非常有用,特别是在处理高度专业化的领域时。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验