• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于主动学习的全文信息结构分析及其在生物医学文献综述中的两个应用。

Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review.

机构信息

Computer Laboratory, University of Cambridge, Cambridge, UK.

出版信息

Bioinformatics. 2013 Jun 1;29(11):1440-7. doi: 10.1093/bioinformatics/btt163. Epub 2013 Apr 5.

DOI:10.1093/bioinformatics/btt163
PMID:23564844
Abstract

MOTIVATION

Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models-Support Vector Machines (SVM)-on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization.

RESULTS

We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles' information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine.

AVAILABILITY

The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/yg244/12bioinfo.html

摘要

动机

能够自动分析科学文章信息结构的技术对于改善生物医学文献的信息访问可能非常有用。然而,大多数现有的方法依赖于监督机器学习 (ML) 和大量的标记数据,这些数据开发和应用于生物医学的不同子领域都非常昂贵。最近的研究表明,对于生物医学摘要的信息结构分析,最小监督就足够了。但是,对于语言和信息都非常复杂的全文来说,这是否现实呢?我们引入并发布了一个新的生物医学文章语料库,其中 50 篇文章根据论证分区 (AZ) 方案进行了注释,并在该语料库上研究了最广泛使用的机器学习模型之一 - 支持向量机 (SVM) 的主动学习。此外,我们引入了两个新的应用程序,它们使用 AZ 通过问答和摘要来支持生物医学领域的实际文献综述。

结果

我们表明,使用在 500 个标记句子(语料库的 6%)上训练的 SVM 进行主动学习的效果非常好,准确率为 82%,仅比完全监督学习低 2%。在我们的问答任务中,生物医学研究人员从 AZ 注释的文章中比未注释的文章中更快地找到相关信息。在摘要任务中,从特定区域提取的句子与黄金标准摘要的相似度明显高于从全文特定部分提取的句子。这些结果表明,对全文信息结构的主动学习确实是可行的,并且准确性足以支持生物医学领域的实际文献综述。

可用性

注释语料库、我们的 AZ 分类器和两个新应用程序可在 http://www.cl.cam.ac.uk/yg244/12bioinfo.html 上获得。

相似文献

1
Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review.基于主动学习的全文信息结构分析及其在生物医学文献综述中的两个应用。
Bioinformatics. 2013 Jun 1;29(11):1440-7. doi: 10.1093/bioinformatics/btt163. Epub 2013 Apr 5.
2
Weakly supervised learning of information structure of scientific abstracts--is it accurate enough to benefit real-world tasks in biomedicine?科学文摘信息结构的弱监督学习——其准确性足以有益于生物医学中的实际任务吗?
Bioinformatics. 2011 Nov 15;27(22):3179-85. doi: 10.1093/bioinformatics/btr536. Epub 2011 Sep 22.
3
Unsupervised discovery of information structure in biomedical documents.生物医学文献中信息结构的无监督发现。
Bioinformatics. 2015 Apr 1;31(7):1084-92. doi: 10.1093/bioinformatics/btu758. Epub 2014 Nov 18.
4
A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.在癌症风险评估背景下对文本信息结构模型的比较和基于用户的评估。
BMC Bioinformatics. 2011 Mar 8;12:69. doi: 10.1186/1471-2105-12-69.
5
Automatic recognition of conceptualization zones in scientific articles and two life science applications.科学文章中概念化区域的自动识别及两个生命科学应用。
Bioinformatics. 2012 Apr 1;28(7):991-1000. doi: 10.1093/bioinformatics/bts071. Epub 2012 Feb 8.
6
Automatic discourse connective detection in biomedical text.生物医学文本中的自动语篇连接词检测。
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):800-8. doi: 10.1136/amiajnl-2011-000775. Epub 2012 Jun 28.
7
Automatic semantic classification of scientific literature according to the hallmarks of cancer.根据癌症特征对科学文献进行自动语义分类。
Bioinformatics. 2016 Feb 1;32(3):432-40. doi: 10.1093/bioinformatics/btv585. Epub 2015 Oct 9.
8
A semi-supervised learning framework for biomedical event extraction based on hidden topics.基于隐主题的生物医学事件抽取的半监督学习框架。
Artif Intell Med. 2015 May;64(1):51-8. doi: 10.1016/j.artmed.2015.03.004. Epub 2015 Apr 1.
9
FigSum: automatically generating structured text summaries for figures in biomedical literature.FigSum:自动为生物医学文献中的图表生成结构化文本摘要。
AMIA Annu Symp Proc. 2009 Nov 14;2009:6-10.
10
Cell line name recognition in support of the identification of synthetic lethality in cancer from text.支持从文本中识别癌症合成致死性的细胞系名称识别
Bioinformatics. 2016 Jan 15;32(2):276-82. doi: 10.1093/bioinformatics/btv570. Epub 2015 Oct 1.

引用本文的文献

1
BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.BioSift:用于药物再利用和临床荟萃分析的生物医学摘要筛选数据集。
Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2913-2923. doi: 10.1145/3539618.3591897. Epub 2023 Jul 18.
2
A systematic review of automatic text summarization for biomedical literature and EHRs.生物医学文献和电子健康记录的自动文本摘要的系统评价。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2287-2297. doi: 10.1093/jamia/ocab143.
3
A manual corpus of annotated main findings of clinical case reports.
一份标注了临床病例报告主要发现的人工语料库。
Database (Oxford). 2019 Jan 1;2019:bay143. doi: 10.1093/database/bay143.