Suppr超能文献

一种使用多轮学习来优化文本预处理和文本分类的方法及其在船舶经纪人电子邮件中的应用。

A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emails.

作者信息

Papageorgiou Grigorios, Economou Polychronis, Bersimis Sotirios

机构信息

Department of Civil Engineering, University of Patras, Patras, Greece.

Department of Business Administration, University of Piraeus, Piraeus, Greece.

出版信息

J Appl Stat. 2024 Jan 30;51(13):2592-2626. doi: 10.1080/02664763.2024.2307535. eCollection 2024.

Abstract

Optimizing text preprocessing and text classification algorithms is an important, everyday task in large organizations and companies and it usually involves a labor-intensive and time-consuming effort. For example, the filtering and sorting of a large number of electronic mails (emails) are crucial to keeping track of the received information and converting it automatically into useful and profitable knowledge. Business emails are often unstructured, noisy, and with many abbreviations and acronyms, which makes their handling a challenging procedure. To overcome those challenges, a two-step classification approach is proposed, along with a two-cycle labeling procedure in order to speed up the labeling process. Every step incorporates a heuristic classification approach to assign emails to predefined classes by comparing several classification and text vectorization algorithms. These algorithms are compared and evaluated using the F1 score and balanced accuracy. The implementation of the proposed algorithm is demonstrated in a shipbroker agent operating in Greece with excellent performance, improving organization and administration while reducing expenses.

摘要

优化文本预处理和文本分类算法是大型组织和公司日常的重要任务,通常需要耗费大量人力和时间。例如,对大量电子邮件进行筛选和分类对于跟踪所接收的信息并将其自动转化为有用且可盈利的知识至关重要。商务电子邮件往往结构不规范、有噪声,且包含许多缩写和首字母缩略词,这使得处理它们成为一个具有挑战性的过程。为了克服这些挑战,提出了一种两步分类方法以及一个双循环标注程序,以加快标注过程。每一步都采用启发式分类方法,通过比较几种分类和文本向量化算法将电子邮件分配到预定义的类别。使用F1分数和平衡准确率对这些算法进行比较和评估。所提出算法的实现展示了在希腊运营的船舶经纪人代理中具有出色的性能,在减少开支的同时改善了组织和管理。

相似文献

9
Sequence Labeling for Disambiguating Medical Abbreviations.用于消除医学缩写歧义的序列标注
J Healthc Inform Res. 2023 Sep 14;7(4):501-526. doi: 10.1007/s41666-023-00146-1. eCollection 2023 Dec.
10
Evading obscure communication from spam emails.避免垃圾邮件中隐晦的通讯。
Math Biosci Eng. 2022 Jan;19(2):1926-1943. doi: 10.3934/mbe.2022091. Epub 2021 Dec 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验