Suppr超能文献

医学摘要中句子类型的分类。

Categorization of sentence types in medical abstracts.

作者信息

McKnight Larry, Srinivasan Padmini

机构信息

Department of Medical Informatics, Columbia, University, New York, NY, USA.

出版信息

AMIA Annu Symp Proc. 2003;2003:440-4.

Abstract

This study evaluated the use of machine learning techniques in the classification of sentence type. 7253 structured abstracts and 204 unstructured abstracts of Randomized Controlled Trials from MedLINE were parsed into sentences and each sentence was labeled as one of four types (Introduction, Method, Result, or Conclusion). Support Vector Machine (SVM) and Linear Classifier models were generated and evaluated on cross-validated data. Treating sentences as a simple "bag of words", the SVM model had an average ROC area of 0.92. Adding a feature of relative sentence location improved performance markedly for some models and overall increasing the average ROC to 0.95. Linear classifier performance was significantly worse than the SVM in all datasets. Using the SVM model trained on structured abstracts to predict unstructured abstracts yielded performance similar to that of models trained with unstructured abstracts in 3 of the 4 types. We conclude that classification of sentence type seems feasible within the domain of RCT's. Identification of sentence types may be helpful for providing context to end users or other text summarization techniques.

摘要

本研究评估了机器学习技术在句子类型分类中的应用。从医学在线数据库(MedLINE)中提取了7253篇结构化摘要和204篇随机对照试验的非结构化摘要,并将其解析为句子,每个句子被标记为四种类型之一(引言、方法、结果或结论)。生成了支持向量机(SVM)和线性分类器模型,并在交叉验证数据上进行了评估。将句子视为简单的“词袋”,SVM模型的平均ROC面积为0.92。添加相对句子位置的特征显著提高了某些模型的性能,总体上将平均ROC提高到0.95。在所有数据集中,线性分类器的性能明显比SVM差。使用在结构化摘要上训练的SVM模型来预测非结构化摘要,在四种类型中的三种类型上,其性能与使用非结构化摘要训练的模型相似。我们得出结论,在随机对照试验领域内,句子类型分类似乎是可行的。识别句子类型可能有助于为最终用户提供上下文或其他文本摘要技术。

相似文献

2
Structuralizing biomedical abstracts with discriminative linguistic features.用有区别的语言特征构建生物医学文摘的结构
Comput Biol Med. 2016 Dec 1;79:276-285. doi: 10.1016/j.compbiomed.2016.10.026. Epub 2016 Nov 2.
4
Using argumentation to extract key sentences from biomedical abstracts.利用论证从生物医学摘要中提取关键句子。
Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200. doi: 10.1016/j.ijmedinf.2006.05.002. Epub 2006 Jul 11.
9
Sentence retrieval for abstracts of randomized controlled trials.随机对照试验摘要的句子检索
BMC Med Inform Decis Mak. 2009 Feb 10;9:10. doi: 10.1186/1472-6947-9-10.

引用本文的文献

2
Research on the structure function recognition of PLOS.公共科学图书馆(PLOS)结构功能识别研究
Front Artif Intell. 2024 Jan 24;7:1254671. doi: 10.3389/frai.2024.1254671. eCollection 2024.
4
Translational drug-interaction corpus.药物相互作用翻译语料库。
Database (Oxford). 2022 May 18;2022. doi: 10.1093/database/baac031.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验