Li Zhenhui, Zhou Fan, Wang Zhiyuan, Xu Xovee, Liu Leyuan, Yin Guangqiang
University of Electronic Science and Technology of China, Chengdu, 610054, China.
Kash Institute of Electronics and Information Industry, Kashi, 84400, China.
Sci Rep. 2024 Mar 1;14(1):5144. doi: 10.1038/s41598-024-55750-x.
Understanding user behavior via IP addresses is a crucial measure towards numerous pragmatic IP-based applications, including online content delivery, fraud prevention, marketing intelligence, and others. While profiling IP addresses through methods like IP geolocation and anomaly detection has been thoroughly studied, the function of an IP address-e.g., whether it pertains to a private enterprise network or a home broadband-remains underexplored. In this work, we initiate the first attempt to address the IP usage scenario classification problem. We collect data consisting of IP addresses from four large-scale regions. A novel continuous neural tree-based ensemble model is proposed to learn IP assignment rules and complex feature interactions. We conduct extensive experiments to evaluate our model in terms of classification accuracy and generalizability. Our results demonstrate that the proposed model is capable of efficiently uncovering significant higher-order feature interactions that enhance IP usage scenario classification, while also possessing the ability to generalize from the source region to the target one.
通过IP地址了解用户行为是众多基于IP的实用应用程序的关键措施,包括在线内容交付、欺诈预防、营销情报等。虽然通过IP地理定位和异常检测等方法对IP地址进行剖析已经得到了深入研究,但IP地址的功能,例如它是属于企业专用网络还是家庭宽带,仍未得到充分探索。在这项工作中,我们首次尝试解决IP使用场景分类问题。我们收集了来自四个大规模区域的包含IP地址的数据。提出了一种基于连续神经树的新型集成模型来学习IP分配规则和复杂的特征交互。我们进行了广泛的实验,从分类准确率和泛化能力方面评估我们的模型。我们的结果表明,所提出的模型能够有效地发现显著的高阶特征交互,从而增强IP使用场景分类,同时还具有从源区域推广到目标区域的能力。