PPGGI, Universidade Federal do Paraná, Curitiba, State of Paraná, Brazil.
PPGPSI, Universidade Federal do Paraná, Curitiba, State of Paraná, Brazil.
PLoS One. 2023 Feb 9;18(2):e0281323. doi: 10.1371/journal.pone.0281323. eCollection 2023.
Several studies applying Machine Learning to deception detection have been published in the last decade. A rich and complex set of settings, approaches, theories, and results is now available. Therefore, one may find it difficult to identify trends, successful paths, gaps, and opportunities for contribution. The present literature review aims to provide the state of research regarding deception detection with Machine Learning. We followed the PRISMA protocol and retrieved 648 articles from ACM Digital Library, IEEE Xplore, Scopus, and Web of Science. 540 of them were screened (108 were duplicates). A final corpus of 81 documents has been summarized as mind maps. Metadata was extracted and has been encoded as Python dictionaries to support a statistical analysis scripted in Python programming language, and available as a collection of Jupyter Lab Notebooks in a GitHub repository. All are available as Jupyter Lab Notebooks. Neural Networks, Support Vector Machines, Random Forest, Decision Tree and K-nearest Neighbor are the five most explored techniques. The studies report a detection performance ranging from 51% to 100%, with 19 works reaching accuracy rate above 0.9. Monomodal, Bimodal, and Multimodal approaches were exploited and achieved various accuracy levels for detection. Bimodal and Multimodal approaches have become a trend over Monomodal ones, although there are high-performance examples of the latter. Studies that exploit language and linguistic features, 75% are dedicated to English. The findings include observations of the following: language and culture, emotional features, psychological traits, cognitive load, facial cues, complexity, performance, and Machine Learning topics. We also present a dataset benchmark. Main conclusions are that labeled datasets from real-life data are scarce. Also, there is still room for new approaches for deception detection with Machine Learning, especially if focused on languages and cultures other than English-based. Further research would greatly contribute by providing new labeled and multimodal datasets for deception detection, both for English and other languages.
过去十年中,已经有许多应用机器学习进行欺骗检测的研究发表。现在已经有了丰富而复杂的设置、方法、理论和结果。因此,人们可能会发现很难识别趋势、成功路径、差距和贡献机会。本文献综述旨在提供关于机器学习欺骗检测的研究现状。我们遵循 PRISMA 协议,从 ACM 数字图书馆、IEEE Xplore、Scopus 和 Web of Science 中检索了 648 篇文章。其中 540 篇进行了筛选(108 篇重复)。最终总结了 81 篇文献作为思维导图。提取元数据并将其编码为 Python 字典,以支持 Python 编程语言编写的统计分析脚本,并作为 GitHub 存储库中的一组 Jupyter Lab 笔记本提供。所有这些都可以作为 Jupyter Lab 笔记本使用。神经网络、支持向量机、随机森林、决策树和 K-最近邻是探索最多的五种技术。研究报告的检测性能从 51%到 100%不等,有 19 项工作的准确率超过 0.9。单模态、双模态和多模态方法都得到了应用,并实现了不同的检测精度水平。双模态和多模态方法已经成为一种趋势,而单模态方法也有高性能的例子。利用语言和语言特征的研究,75%专门针对英语。研究结果包括以下观察结果:语言和文化、情感特征、心理特征、认知负荷、面部线索、复杂性、性能和机器学习主题。我们还展示了一个数据集基准。主要结论是,来自真实数据的有标签数据集稀缺。此外,机器学习的欺骗检测仍然有新方法的空间,特别是如果专注于英语以外的语言和文化。通过为欺骗检测提供新的有标签和多模态数据集,无论是英语还是其他语言,进一步的研究将做出巨大贡献。