自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.

机构信息

University of Wisconsin, Milwaukee, Milwaukee WI 53211, USA.

出版信息

Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.

DOI:10.1093/bioinformatics/btp548

PMID:19783830

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2913661/

Abstract

Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.

摘要

生物医学文本通常可以表示为四个修辞类别

引言、方法、结果和讨论（IMRAD）。将句子分类为这些类别可以有益于许多其他文本挖掘任务。尽管许多研究已经应用了不同的方法来自动将 MEDLINE 摘要中的句子分类为 IMRAD 类别，但很少有研究探索将出现在全文生物医学文章中的句子进行分类。我们首先评估了全文生物医学文章中的句子是否可以可靠地注释为 IMRAD 格式，然后探索了自动将这些句子分类为 IMRAD 类别的不同方法。我们的结果显示，整体注释一致性为 82.14%，kappa 得分为 0.756。最佳分类系统是基于手动注释数据训练的多项式朴素贝叶斯分类器，其准确率为 91.95%，平均 F1 得分为 91.55%，明显高于基线系统。该系统的网络版本可在-http://wood.ims.uwm.edu/full_text_classifier/ 上获得。

相似文献

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。

Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.

Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion.将全文生物医学文章中的句子自动分类为引言、方法、结果和讨论部分。

Summit Transl Bioinform. 2009 Mar 1;2009:6-10.

Challenges for automatically extracting molecular interactions from full-text articles.从全文文章中自动提取分子相互作用的挑战。

BMC Bioinformatics. 2009 Sep 24;10:311. doi: 10.1186/1471-2105-10-311.

The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes.生物显微镜语料库：标注了不确定性、否定及其范围的生物医学文本。

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-9-S11-S9.

Automatic classification of sentences to support Evidence Based Medicine.支持循证医学的句子自动分类。

BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-12-S2-S5.

Using argumentation to extract key sentences from biomedical abstracts.利用论证从生物医学摘要中提取关键句子。

Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200. doi: 10.1016/j.ijmedinf.2006.05.002. Epub 2006 Jul 11.

BioRAT: extracting biological information from full-length papers.BioRAT：从全文论文中提取生物学信息。

Bioinformatics. 2004 Nov 22;20(17):3206-13. doi: 10.1093/bioinformatics/bth386. Epub 2004 Jul 1.

Semantic role labeling for protein transport predicates.蛋白质转运谓词的语义角色标注。

BMC Bioinformatics. 2008 Jun 11;9:277. doi: 10.1186/1471-2105-9-277.

Sentence retrieval for abstracts of randomized controlled trials.随机对照试验摘要的句子检索

BMC Med Inform Decis Mak. 2009 Feb 10;9:10. doi: 10.1186/1472-6947-9-10.

The structural and content aspects of abstracts versus bodies of full text journal articles are different.文摘的结构和内容方面与全文期刊文章的不同。

BMC Bioinformatics. 2010 Sep 29;11:492. doi: 10.1186/1471-2105-11-492.

引用本文的文献

Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity.生物医学文献中句子的特征描述与自动分类：基因表达和蛋白激酶活性生物编目的案例研究

bioRxiv. 2025 Jan 8:2025.01.06.631539. doi: 10.1101/2025.01.06.631539.

Research on the structure function recognition of PLOS.公共科学图书馆（PLOS）结构功能识别研究

Front Artif Intell. 2024 Jan 24;7:1254671. doi: 10.3389/frai.2024.1254671. eCollection 2024.

Beyond opinion classification: Extracting facts, opinions and experiences from health forums.超越观点分类：从健康论坛中提取事实、观点和经验。

PLoS One. 2019 Jan 9;14(1):e0209961. doi: 10.1371/journal.pone.0209961. eCollection 2019.

Automatic recognition of self-acknowledged limitations in clinical research literature.临床研究文献中自我承认局限性的自动识别。

J Am Med Inform Assoc. 2018 Jul 1;25(7):855-861. doi: 10.1093/jamia/ocy038.

Biomedical text mining for research rigor and integrity: tasks, challenges, directions.生物医学文本挖掘的研究严谨性和完整性：任务、挑战和方向。

Brief Bioinform. 2018 Nov 27;19(6):1400-1414. doi: 10.1093/bib/bbx057.

Section level search functionality in Europe PMC.欧洲分子生物学实验室欧洲生物信息学研究所（EMBL-EBI）维护的欧洲 PMC 中的章节级搜索功能。

J Biomed Semantics. 2015 Mar 10;6:7. doi: 10.1186/s13326-015-0003-7. eCollection 2015.

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.随机对照试验文章的自动置信度分级分类：循证医学的辅助手段

J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

Figure-associated text summarization and evaluation.与图相关的文本总结与评估。

PLoS One. 2015 Feb 2;10(2):e0115671. doi: 10.1371/journal.pone.0115671. eCollection 2015.

Studying PubMed usages in the field for complex problem solving: Implications for tool design.研究PubMed在复杂问题解决领域的应用：对工具设计的启示。

J Am Soc Inf Sci Technol. 2013 May 1;64(5):874-92. doi: 10.1002/asi.22796.

GeneRIF indexing: sentence selection based on machine learning.GeneRIF 索引：基于机器学习的句子选择。

BMC Bioinformatics. 2013 May 31;14:171. doi: 10.1186/1471-2105-14-171.

本文引用的文献

Summit Transl Bioinform. 2009 Mar 1;2009:6-10.

Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension.图注是否足够？评估相关文本对生物医学图理解的贡献。

J Biomed Discov Collab. 2009 Jan 6;4:1. doi: 10.1186/1747-5333-4-1.

Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users.生物医学文本的多维分类：致力于为不同用户自动提供实用价值高的文本。

Bioinformatics. 2008 Sep 15;24(18):2086-93. doi: 10.1093/bioinformatics/btn381. Epub 2008 Aug 20.

Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians.为医生开发、实施一个定义性问答系统并进行认知评估。

J Biomed Inform. 2007 Jun;40(3):236-51. doi: 10.1016/j.jbi.2007.03.002. Epub 2007 Mar 12.

New directions in biomedical text annotation: definitions, guidelines and corpus construction.生物医学文本注释的新方向：定义、指南与语料库构建

BMC Bioinformatics. 2006 Jul 25;7:356. doi: 10.1186/1471-2105-7-356.

Zone analysis in biology articles as a basis for information extraction.生物学文章中的区域分析作为信息提取的基础。

Int J Med Inform. 2006 Jun;75(6):468-87. doi: 10.1016/j.ijmedinf.2005.06.013. Epub 2005 Aug 19.

The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey.引言、方法、结果与讨论（IMRAD）结构：一项为期五十年的调查

J Med Libr Assoc. 2004 Jul;92(3):364-7.

Categorization of sentence types in medical abstracts.医学摘要中句子类型的分类。

AMIA Annu Symp Proc. 2003;2003:440-4.

Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup.用于数据库管理的文本数据挖掘评估：从知识发现与数据挖掘竞赛杯赛中学到的经验教训。

Bioinformatics. 2003;19 Suppl 1:i331-9. doi: 10.1093/bioinformatics/btg1046.

Tissue-specific distributions of alternatively spliced human PECAM-1 isoforms.人PECAM-1可变剪接异构体的组织特异性分布。

Am J Physiol Heart Circ Physiol. 2003 Mar;284(3):H1008-17. doi: 10.1152/ajpheart.00600.2002. Epub 2002 Nov 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验