Suppr超能文献

数的力量:利用大数据简化情感分类。

Strength in Numbers: Using Big Data to Simplify Sentiment Classification.

机构信息

1 IOMS Department, NYU Stern School of Business , New York, New York.

2 School of Business, Stevens Institute of Technology , Hoboken, New Jersey.

出版信息

Big Data. 2017 Sep;5(3):256-271.

Abstract

Sentiment classification, the task of assigning a positive or negative label to a text segment, is a key component of mainstream applications such as reputation monitoring, sentiment summarization, and item recommendation. Even though the performance of sentiment classification methods has steadily improved over time, their ever-increasing complexity renders them comprehensible by only a shrinking minority of expert practitioners. For all others, such highly complex methods are black-box predictors that are hard to tune and even harder to justify to decision makers. Motivated by these shortcomings, we introduce BigCounter: a new algorithm for sentiment classification that substitutes algorithmic complexity with Big Data. Our algorithm combines standard data structures with statistical testing to deliver accurate and interpretable predictions. It is also parameter free and suitable for use virtually "out of the box," which makes it appealing for organizations wanting to leverage their troves of unstructured data without incurring the significant expense of creating in-house teams of data scientists. Finally, BigCounter's efficient and parallelizable design makes it applicable to very large data sets. We apply our method on such data sets toward a study on the limits of Big Data for sentiment classification. Our study finds that, after a certain point, predictive performance tends to converge and additional data have little benefit. Our algorithmic design and findings provide the foundations for future research on the data-over-computation paradigm for classification problems.

摘要

情感分类,即将一个积极或消极的标签分配给一个文本段的任务,是主流应用程序(如声誉监测、情感总结和项目推荐)的关键组成部分。尽管情感分类方法的性能随着时间的推移而稳步提高,但它们日益复杂,只有越来越少的专家实践者能够理解。对于其他所有人来说,这些高度复杂的方法是黑盒预测器,难以调整,甚至更难以向决策者证明其合理性。出于这些缺点,我们引入了 BigCounter:一种用于情感分类的新算法,用大数据替代算法复杂度。我们的算法将标准数据结构与统计测试相结合,提供准确和可解释的预测。它还没有参数,几乎可以“开箱即用”,这对于希望利用其大量非结构化数据而又不想承担创建内部数据科学家团队的巨大费用的组织来说非常有吸引力。最后,BigCounter 的高效和可并行化设计使其适用于非常大的数据集。我们将我们的方法应用于这些数据集,以研究大数据在情感分类中的局限性。我们的研究发现,在达到一定程度后,预测性能趋于收敛,额外的数据几乎没有好处。我们的算法设计和发现为分类问题的数据优于计算范式的未来研究提供了基础。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验