一种基于质心的新型句子分类方法，用于对新冠疫情新闻报道进行摘要提取。

A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.

作者信息

Banerjee Sumanta, Mukherjee Shyamapada, Bandyopadhyay Sivaji

机构信息

Computer Science and Engineering, National Institute of Technology Silchar, Silchar, Assam 788010 India.

出版信息

Int J Inf Technol. 2023;15(4):1789-1801. doi: 10.1007/s41870-023-01221-x. Epub 2023 Mar 24.

DOI:10.1007/s41870-023-01221-x

PMID:37256024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10036244/

Abstract

A COVID-19 news covers subtopics like infections, deaths, the economy, jobs, and more. The proposed method generates a news summary based on the subtopics of a reader's interest. It extracts a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them. The centroid is then used as a query in the vector space model (VSM) for sentence classification and extraction, producing a query focused summarization (QFS) of the documents. Three approaches, TF-IDF, word vector averaging, and auto-encoder are experimented to generate sentence embedding that are used in VSM. These embeddings are ranked depending on their similarities with the query embedding. A Novel approach has been introduced to find the value for the similarity parameter using a supervised technique to classify the sentences. Finally, the performance of the method has been assessed in two different ways. All the sentences of the dataset are considered together in the first assessment and in the second, each document wise group of sentences is considered separately using fivefold cross-validation. The proposed method has achieved a minimum of 0.60 to a maximum of 0.63 mean F1 scores with the three sentence encoding approaches on the test dataset.

摘要

一篇关于新冠疫情的新闻涵盖了感染、死亡、经济、就业等多个子主题。所提出的方法基于读者感兴趣的子主题生成新闻摘要。它通过子主题句子中常用的词提取具有这些句子词汇模式的质心。然后，将该质心用作向量空间模型（VSM）中的查询，用于句子分类和提取，从而生成文档的查询聚焦摘要（QFS）。实验了三种方法，即词频 - 逆文档频率（TF-IDF）、词向量平均和自动编码器，以生成用于VSM的句子嵌入。这些嵌入根据它们与查询嵌入的相似度进行排序。引入了一种新颖的方法，使用监督技术对句子进行分类来找到相似度参数的值。最后，以两种不同的方式评估了该方法的性能。在第一次评估中，将数据集中的所有句子放在一起考虑，在第二次评估中，使用五折交叉验证分别考虑每个文档的句子组。所提出的方法在测试数据集上使用三种句子编码方法时，平均F1分数最低为0.60，最高为0.63。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77c4/10036244/e682256b16a6/41870_2023_1221_Fig1_HTML.jpg

相似文献

A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.

Int J Inf Technol. 2023;15(4):1789-1801. doi: 10.1007/s41870-023-01221-x. Epub 2023 Mar 24.

Improving extractive document summarization with sentence centrality.

PLoS One. 2022 Jul 22;17(7):e0268278. doi: 10.1371/journal.pone.0268278. eCollection 2022.

Extractive single document summarization using binary differential evolution: Optimization of different sentence quality measures.

PLoS One. 2019 Nov 14;14(11):e0223477. doi: 10.1371/journal.pone.0223477. eCollection 2019.

N-GPETS: Neural Attention Graph-Based Pretrained Statistical Model for Extractive Text Summarization.

Comput Intell Neurosci. 2022 Nov 22;2022:6241373. doi: 10.1155/2022/6241373. eCollection 2022.

Quantifying the informativeness for biomedical literature summarization: An itemset mining method.

Comput Methods Programs Biomed. 2017 Jul;146:77-89. doi: 10.1016/j.cmpb.2017.05.011. Epub 2017 May 27.

Abstractive text summarization of low-resourced languages using deep learning.

PeerJ Comput Sci. 2023 Jan 13;9:e1176. doi: 10.7717/peerj-cs.1176. eCollection 2023.

Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion.

Inf Process Manag. 2007 Nov;43(6):1606-1618. doi: 10.1016/j.ipm.2007.01.023. Epub 2007 Apr 19.

Graph-based extractive text summarization method for Hausa text.

PLoS One. 2023 May 9;18(5):e0285376. doi: 10.1371/journal.pone.0285376. eCollection 2023.

Multi-granularity heterogeneous graph attention networks for extractive document summarization.

Neural Netw. 2022 Nov;155:340-347. doi: 10.1016/j.neunet.2022.08.021. Epub 2022 Sep 5.

Multiview Convolutional Neural Networks for Multidocument Extractive Summarization.

IEEE Trans Cybern. 2017 Oct;47(10):3230-3242. doi: 10.1109/TCYB.2016.2628402. Epub 2016 Nov 28.

本文引用的文献

Extractive text summarization system to aid data extraction from full text in systematic review development.

J Biomed Inform. 2016 Dec;64:265-272. doi: 10.1016/j.jbi.2016.10.014. Epub 2016 Oct 27.

Representation learning: a review and new perspectives.

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

Supervised and traditional term weighting methods for automatic text categorization.

IEEE Trans Pattern Anal Mach Intell. 2009 Apr;31(4):721-35. doi: 10.1109/TPAMI.2008.110.

Reducing the dimensionality of data with neural networks.

Science. 2006 Jul 28;313(5786):504-7. doi: 10.1126/science.1127647.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于质心的新型句子分类方法，用于对新冠疫情新闻报道进行摘要提取。

A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献