Loconte Riccardo, Russo Roberto, Capuozzo Pasquale, Pietrini Pietro, Sartori Giuseppe
Molecular Mind Lab, IMT School for Advanced Studies Lucca, Piazza San Francesco 19, 55100, Lucca, LU, Italy.
Department of Mathematics "Tullio Levi-Civita", University of Padova, Padova, Italy.
Sci Rep. 2023 Dec 21;13(1):22849. doi: 10.1038/s41598-023-50214-0.
Human accuracy in detecting deception with intuitive judgments has been proven to not go above the chance level. Therefore, several automatized verbal lie detection techniques employing Machine Learning and Transformer models have been developed to reach higher levels of accuracy. This study is the first to explore the performance of a Large Language Model, FLAN-T5 (small and base sizes), in a lie-detection classification task in three English-language datasets encompassing personal opinions, autobiographical memories, and future intentions. After performing stylometric analysis to describe linguistic differences in the three datasets, we tested the small- and base-sized FLAN-T5 in three Scenarios using 10-fold cross-validation: one with train and test set coming from the same single dataset, one with train set coming from two datasets and the test set coming from the third remaining dataset, one with train and test set coming from all the three datasets. We reached state-of-the-art results in Scenarios 1 and 3, outperforming previous benchmarks. The results revealed also that model performance depended on model size, with larger models exhibiting higher performance. Furthermore, stylometric analysis was performed to carry out explainability analysis, finding that linguistic features associated with the Cognitive Load framework may influence the model's predictions.
事实证明,人类通过直觉判断来检测欺骗行为的准确率不会超过随机水平。因此,人们开发了几种采用机器学习和Transformer模型的自动化言语谎言检测技术,以达到更高的准确率。本研究首次探讨了大型语言模型FLAN-T5(小型和基础版本)在包含个人观点、自传式记忆和未来意图的三个英语数据集中的谎言检测分类任务中的表现。在进行文体分析以描述这三个数据集的语言差异之后,我们使用10折交叉验证在三种场景下测试了小型和基础版本的FLAN-T5:一种场景是训练集和测试集来自同一个数据集,一种场景是训练集来自两个数据集,测试集来自剩下的第三个数据集,还有一种场景是训练集和测试集来自所有三个数据集。我们在场景1和场景3中取得了领先的结果,超过了之前的基准。结果还表明,模型性能取决于模型大小,较大的模型表现出更高的性能。此外,还进行了文体分析以进行可解释性分析,发现与认知负荷框架相关的语言特征可能会影响模型的预测。