Suppr超能文献

自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.

机构信息

University of Wisconsin, Milwaukee, Milwaukee WI 53211, USA.

出版信息

Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.

Abstract

Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.

摘要

生物医学文本通常可以表示为四个修辞类别

引言、方法、结果和讨论(IMRAD)。将句子分类为这些类别可以有益于许多其他文本挖掘任务。尽管许多研究已经应用了不同的方法来自动将 MEDLINE 摘要中的句子分类为 IMRAD 类别,但很少有研究探索将出现在全文生物医学文章中的句子进行分类。我们首先评估了全文生物医学文章中的句子是否可以可靠地注释为 IMRAD 格式,然后探索了自动将这些句子分类为 IMRAD 类别的不同方法。我们的结果显示,整体注释一致性为 82.14%,kappa 得分为 0.756。最佳分类系统是基于手动注释数据训练的多项式朴素贝叶斯分类器,其准确率为 91.95%,平均 F1 得分为 91.55%,明显高于基线系统。该系统的网络版本可在-http://wood.ims.uwm.edu/full_text_classifier/ 上获得。

相似文献

5
Automatic classification of sentences to support Evidence Based Medicine.支持循证医学的句子自动分类。
BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-12-S2-S5.
6
Using argumentation to extract key sentences from biomedical abstracts.利用论证从生物医学摘要中提取关键句子。
Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200. doi: 10.1016/j.ijmedinf.2006.05.002. Epub 2006 Jul 11.
7
BioRAT: extracting biological information from full-length papers.BioRAT:从全文论文中提取生物学信息。
Bioinformatics. 2004 Nov 22;20(17):3206-13. doi: 10.1093/bioinformatics/bth386. Epub 2004 Jul 1.
8
Semantic role labeling for protein transport predicates.蛋白质转运谓词的语义角色标注。
BMC Bioinformatics. 2008 Jun 11;9:277. doi: 10.1186/1471-2105-9-277.
9
Sentence retrieval for abstracts of randomized controlled trials.随机对照试验摘要的句子检索
BMC Med Inform Decis Mak. 2009 Feb 10;9:10. doi: 10.1186/1472-6947-9-10.

引用本文的文献

2
Research on the structure function recognition of PLOS.公共科学图书馆(PLOS)结构功能识别研究
Front Artif Intell. 2024 Jan 24;7:1254671. doi: 10.3389/frai.2024.1254671. eCollection 2024.
8
Figure-associated text summarization and evaluation.与图相关的文本总结与评估。
PLoS One. 2015 Feb 2;10(2):e0115671. doi: 10.1371/journal.pone.0115671. eCollection 2015.

本文引用的文献

6
Zone analysis in biology articles as a basis for information extraction.生物学文章中的区域分析作为信息提取的基础。
Int J Med Inform. 2006 Jun;75(6):468-87. doi: 10.1016/j.ijmedinf.2005.06.013. Epub 2005 Aug 19.
10
Tissue-specific distributions of alternatively spliced human PECAM-1 isoforms.人PECAM-1可变剪接异构体的组织特异性分布。
Am J Physiol Heart Circ Physiol. 2003 Mar;284(3):H1008-17. doi: 10.1152/ajpheart.00600.2002. Epub 2002 Nov 14.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验