Ehnert Philip, Schröter Julian
iits-consulting/ImpressSol GmbH, Department of Artificial Intelligence, Au in der Hallertau, Germany.
FOM-Hochschule für Oekonomie und Management GmbH, Department of Business Informatics, Bonn, Germany.
Front Artif Intell. 2024 Mar 20;7:1200949. doi: 10.3389/frai.2024.1200949. eCollection 2024.
Identifying key statements in large volumes of short, user-generated texts is essential for decision-makers to quickly grasp their key content. To address this need, this research introduces a novel abstractive key point generation (KPG) approach applicable to unlabeled text corpora, using an unsupervised approach, a feature not yet seen in existing abstractive KPG methods. The proposed method uniquely combines topic modeling for unsupervised data space segmentation with abstractive summarization techniques to efficiently generate semantically representative key points from text collections. This is further enhanced by hyperparameter tuning to optimize both the topic modeling and abstractive summarization processes. The hyperparameter tuning of the topic modeling aims at making the cluster assignment more deterministic as the probabilistic nature of the process would otherwise lead to high variability in the output. The abstractive summarization process is optimized using a Davies-Bouldin Index specifically adapted to this use case, so that the generated key points more accurately reflect the characteristic properties of this cluster. In addition, our research recommends an automated evaluation that provides a quantitative complement to the traditional qualitative analysis of KPG. This method regards KPG as a specialized form of Multidocument summarization (MDS) and employs both word-based and word-embedding-based metrics for evaluation. These criteria allow for a comprehensive and nuanced analysis of the KPG output. Demonstrated through application to a political debate on Twitter, the versatility of this approach extends to various domains, such as product review analysis and survey evaluation. This research not only paves the way for innovative development in abstractive KPG methods but also sets a benchmark for their evaluation.
在大量简短的用户生成文本中识别关键语句,对于决策者快速掌握其关键内容至关重要。为满足这一需求,本研究引入了一种新颖的抽象关键点生成(KPG)方法,该方法适用于未标记的文本语料库,采用无监督方法,这是现有抽象KPG方法中尚未出现的特性。所提出的方法独特地将用于无监督数据空间分割的主题建模与抽象总结技术相结合,以从文本集合中高效生成语义上具有代表性的关键点。通过超参数调整进一步增强这一过程,以优化主题建模和抽象总结过程。主题建模的超参数调整旨在使聚类分配更具确定性,因为该过程的概率性质否则会导致输出的高度变异性。抽象总结过程使用专门为此用例改编的戴维斯-布尔丁指数进行优化,以便生成的关键点更准确地反映该聚类的特征属性。此外,我们的研究推荐了一种自动评估方法,为传统的KPG定性分析提供定量补充。该方法将KPG视为多文档总结(MDS)的一种特殊形式,并采用基于单词和基于词嵌入的指标进行评估。这些标准允许对KPG输出进行全面且细致入微的分析。通过应用于推特上的一场政治辩论得到证明,这种方法的通用性扩展到了各个领域,如产品评论分析和调查评估。本研究不仅为抽象KPG方法的创新发展铺平了道路,还为其评估设定了基准。