Suppr超能文献

医学摘要中句子类型的分类。

Categorization of sentence types in medical abstracts.

作者信息

McKnight Larry, Srinivasan Padmini

机构信息

Department of Medical Informatics, Columbia, University, New York, NY, USA.

出版信息

AMIA Annu Symp Proc. 2003;2003:440-4.

Abstract

This study evaluated the use of machine learning techniques in the classification of sentence type. 7253 structured abstracts and 204 unstructured abstracts of Randomized Controlled Trials from MedLINE were parsed into sentences and each sentence was labeled as one of four types (Introduction, Method, Result, or Conclusion). Support Vector Machine (SVM) and Linear Classifier models were generated and evaluated on cross-validated data. Treating sentences as a simple "bag of words", the SVM model had an average ROC area of 0.92. Adding a feature of relative sentence location improved performance markedly for some models and overall increasing the average ROC to 0.95. Linear classifier performance was significantly worse than the SVM in all datasets. Using the SVM model trained on structured abstracts to predict unstructured abstracts yielded performance similar to that of models trained with unstructured abstracts in 3 of the 4 types. We conclude that classification of sentence type seems feasible within the domain of RCT's. Identification of sentence types may be helpful for providing context to end users or other text summarization techniques.

摘要

本研究评估了机器学习技术在句子类型分类中的应用。从医学在线数据库(MedLINE)中提取了7253篇结构化摘要和204篇随机对照试验的非结构化摘要,并将其解析为句子,每个句子被标记为四种类型之一(引言、方法、结果或结论)。生成了支持向量机(SVM)和线性分类器模型,并在交叉验证数据上进行了评估。将句子视为简单的“词袋”,SVM模型的平均ROC面积为0.92。添加相对句子位置的特征显著提高了某些模型的性能,总体上将平均ROC提高到0.95。在所有数据集中,线性分类器的性能明显比SVM差。使用在结构化摘要上训练的SVM模型来预测非结构化摘要,在四种类型中的三种类型上,其性能与使用非结构化摘要训练的模型相似。我们得出结论,在随机对照试验领域内,句子类型分类似乎是可行的。识别句子类型可能有助于为最终用户提供上下文或其他文本摘要技术。

相似文献

1
Categorization of sentence types in medical abstracts.
AMIA Annu Symp Proc. 2003;2003:440-4.
2
Structuralizing biomedical abstracts with discriminative linguistic features.
Comput Biol Med. 2016 Dec 1;79:276-285. doi: 10.1016/j.compbiomed.2016.10.026. Epub 2016 Nov 2.
4
Using argumentation to extract key sentences from biomedical abstracts.
Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200. doi: 10.1016/j.ijmedinf.2006.05.002. Epub 2006 Jul 11.
6
Classification of Clinically Useful Sentences in MEDLINE.
AMIA Annu Symp Proc. 2015 Nov 5;2015:2015-24. eCollection 2015.
8
Improving data retrieval quality: Evidence based medicine perspective.
Int J Risk Saf Med. 2015;27 Suppl 1:S106-7. doi: 10.3233/JRS-150710.
9
Sentence retrieval for abstracts of randomized controlled trials.
BMC Med Inform Decis Mak. 2009 Feb 10;9:10. doi: 10.1186/1472-6947-9-10.
10
Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.
J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

引用本文的文献

1
A New Public Corpus for Clinical Section Identification: MedSecId.
Proc Int Conf Comput Ling. 2022 Oct;2022:3709-3721.
2
Research on the structure function recognition of PLOS.
Front Artif Intell. 2024 Jan 24;7:1254671. doi: 10.3389/frai.2024.1254671. eCollection 2024.
4
Translational drug-interaction corpus.
Database (Oxford). 2022 May 18;2022. doi: 10.1093/database/baac031.
6
Combination of conditional random field with a rule based method in the extraction of PICO elements.
BMC Med Inform Decis Mak. 2018 Dec 4;18(1):128. doi: 10.1186/s12911-018-0699-2.
8
DiMeX: A Text Mining System for Mutation-Disease Association Extraction.
PLoS One. 2016 Apr 13;11(4):e0152725. doi: 10.1371/journal.pone.0152725. eCollection 2016.
10
Extracting semantically enriched events from biomedical literature.
BMC Bioinformatics. 2012 May 23;13:108. doi: 10.1186/1471-2105-13-108.

本文引用的文献

1
Exploring text mining from MEDLINE.
Proc AMIA Symp. 2002:722-6.
2
Developing optimal search strategies for detecting clinically sound studies in MEDLINE.
J Am Med Inform Assoc. 1994 Nov-Dec;1(6):447-58. doi: 10.1136/jamia.1994.95153434.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验