TIME Research Area, School of Business and Economics, RWTH Aachen University, Aachen, Germany.
Strategy and Entrepreneurship Area, School of Business, Wake Forest University, Winston-Salem, NC, United States of America.
PLoS One. 2022 Apr 28;17(4):e0266325. doi: 10.1371/journal.pone.0266325. eCollection 2022.
Topic modeling is a popular technique for exploring large document collections. It has proven useful for this task, but its application poses a number of challenges. First, the comparison of available algorithms is anything but simple, as researchers use many different datasets and criteria for their evaluation. A second challenge is the choice of a suitable metric for evaluating the calculated results. The metrics used so far provide a mixed picture, making it difficult to verify the accuracy of topic modeling outputs. Altogether, the choice of an appropriate algorithm and the evaluation of the results remain unresolved issues. Although many studies have reported promising performance by various topic models, prior research has not yet systematically investigated the validity of the outcomes in a comprehensive manner, that is, using more than a small number of the available algorithms and metrics. Consequently, our study has two main objectives. First, we compare all commonly used, non-application-specific topic modeling algorithms and assess their relative performance. The comparison is made against a known clustering and thus enables an unbiased evaluation of results. Our findings show a clear ranking of the algorithms in terms of accuracy. Secondly, we analyze the relationship between existing metrics and the known clustering, and thus objectively determine under what conditions these algorithms may be utilized effectively. This way, we enable readers to gain a deeper understanding of the performance of topic modeling techniques and the interplay of performance and evaluation metrics.
主题建模是一种用于探索大型文档集合的流行技术。它已被证明在这项任务中非常有用,但它的应用也带来了一些挑战。首先,可用算法的比较绝非易事,因为研究人员使用许多不同的数据集和评估标准。第二个挑战是选择合适的度量标准来评估计算结果。迄今为止使用的度量标准提供了一幅混合的画面,使得难以验证主题建模输出的准确性。总的来说,选择合适的算法和评估结果仍然是未解决的问题。尽管许多研究报告了各种主题模型的有希望的性能,但之前的研究尚未系统地以全面的方式调查结果的有效性,即使用比可用算法和度量标准数量多的算法和度量标准。因此,我们的研究有两个主要目标。首先,我们比较所有常用的、非特定于应用的主题建模算法,并评估它们的相对性能。这种比较是针对已知的聚类进行的,从而能够对结果进行无偏评估。我们的发现表明,这些算法在准确性方面有明确的排名。其次,我们分析现有度量标准与已知聚类之间的关系,从而客观地确定在什么条件下可以有效地利用这些算法。这样,我们使读者能够更深入地了解主题建模技术的性能以及性能和评估度量标准之间的相互作用。