ICFO - Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, Av. Carl Friedrich Gauss 3, 08860, Castelldefels, Barcelona, Spain.
Department of Cognitive Science and Artificial Intelligence, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands.
Sci Rep. 2020 Oct 12;10(1):16983. doi: 10.1038/s41598-020-73622-y.
We address the problem of user intent prediction from clickstream data of an e-commerce website via two conceptually different approaches: a hand-crafted feature-based classification and a deep learning-based classification. In both approaches, we deliberately coarse-grain a new clickstream proprietary dataset to produce symbolic trajectories with minimal information. Then, we tackle the problem of trajectory classification of arbitrary length and ultimately, early prediction of limited-length trajectories, both for balanced and unbalanced datasets. Our analysis shows that k-gram statistics with visibility graph motifs produce fast and accurate classifications, highlighting that purchase prediction is reliable even for extremely short observation windows. In the deep learning case, we benchmarked previous state-of-the-art (SOTA) models on the new dataset, and improved classification accuracy over SOTA performances with our proposed LSTM architecture. We conclude with an in-depth error analysis and a careful evaluation of the pros and cons of the two approaches when applied to realistic industry use cases.
基于手工制作特征的分类和基于深度学习的分类。在这两种方法中,我们都故意将新的点击流专有数据集粗粒度化,以生成具有最小信息量的符号轨迹。然后,我们解决了任意长度的轨迹分类问题,并最终解决了有限长度轨迹的早期预测问题,这两个问题既适用于平衡数据集,也适用于不平衡数据集。我们的分析表明,带有可见图模的 k-gram 统计信息可以产生快速而准确的分类,这表明即使在极短的观察窗口内,购买预测也是可靠的。在深度学习的情况下,我们在新数据集上对以前的最先进(SOTA)模型进行了基准测试,并通过我们提出的 LSTM 架构提高了 SOTA 性能的分类准确性。最后,我们进行了深入的误差分析,并仔细评估了这两种方法在应用于实际行业用例时的优缺点。