Ho Chi Minh City Open University, Ho Chi Minh City, Viet Nam.
School of Computer Science, Technological University Dublin, Grangegorman, Dublin 7, Ireland.
Child Abuse Negl. 2024 Jan;147:106558. doi: 10.1016/j.chiabu.2023.106558. Epub 2023 Dec 1.
Producing, distributing or discussing child sexual abuse materials (CSAM) is often committed through the dark web to stay hidden from search engines and to evade detection by law enforcement agencies. Additionally, on the dark web, the CSAM creators employ various techniques to avoid detection and conceal their activities. The large volume of CSAM on the dark web presents a global social problem and poses a significant challenge for helplines, hotlines and law enforcement agencies.
Identifying CSAM discussions on the dark web and uncovering associated metadata insights into characteristics, behaviors and motivation of CSAM creators.
We have conducted an analysis of more than 353,000 posts generated by 35,400 distinct users and written in 118 different languages across eight dark web forums in 2022. Out of these, approximately 221,000 posts were written in English and contributed by around 29,500 unique users.
We propose a CSAM detection intelligence system. The system uses a manually labeled dataset to train, evaluate and select an efficient CSAM classification model. Once we identify CSAM creators and victims through CSAM posts on the dark web, we proceed to analyse, visualize and uncover information concerning the behaviors of CSAM creators and victims.
The CSAM classifier, based on Support Vector Machine model, exhibited good performance, achieving the highest precision of 92.3 % and accuracy of 87.6 %. While, the Naive Bayes combination is the best in term of recall, achieving 89 %. Across the eight forums in 2022, our Support Vector Machine model detected around 63,000 English CSAM posts and identified near 10,500 English CSAM creators. The analysis of metadata of CSAM posts revealed meaningful information about CSAM creators, their victims and social media platforms they used. This included: (1) The topics of interest and the preferred social media platforms for the 20 most active CSAM creators (For example, two top creators were interested in topics like video, webcam and general content in forums, and they frequently used platforms like Omegle and Skype); (2) Information about the ages and nationalities of the victims typically mentioned by CSAM creators, such as victims aged 12 and 13 with nationalities including British and Russian; (3) social media platforms preferred by CSAM creators for sharing or uploading CSAM, include Omegle, YouTube, Skype, Instagram and Telegram.
Our CSAM detection system exhibits high performance in precision, recall, and accuracy in real-time when classifying CSAM and non-CSAM posts. Additionally, it can extract and visualize valuable and unique insights about CSAM creators and victims by employing advanced statistical methods. These insights prove beneficial to our partners, i.e. national hotlines and child agencies.
制作、传播或讨论儿童性虐待材料(CSAM)通常是通过暗网进行的,以躲避搜索引擎并逃避执法机构的检测。此外,在暗网上,CSAM 创作者采用各种技术来避免检测并隐藏其活动。暗网上大量的 CSAM 是一个全球性的社会问题,给热线、举报热线和执法机构带来了重大挑战。
识别暗网上的 CSAM 讨论,并揭示与 CSAM 创作者的特征、行为和动机相关的元数据洞察。
我们对 2022 年八个暗网论坛中 35,400 名不同用户生成的超过 353,000 个帖子进行了分析,这些帖子使用了 118 种不同的语言编写。其中,约有 221,000 个帖子是用英语编写的,由大约 29,500 个独特用户贡献。
我们提出了一个 CSAM 检测智能系统。该系统使用手动标记数据集来训练、评估和选择有效的 CSAM 分类模型。一旦我们通过暗网上的 CSAM 帖子识别出 CSAM 创作者和受害者,我们就会对他们的行为进行分析、可视化和揭示信息。
基于支持向量机模型的 CSAM 分类器表现出良好的性能,最高精度达到 92.3%,准确率达到 87.6%。而朴素贝叶斯组合在召回率方面表现最好,达到 89%。在 2022 年的八个论坛中,我们的支持向量机模型检测到了大约 63,000 个英语 CSAM 帖子,并识别出了近 10,500 个英语 CSAM 创作者。CSAM 帖子的元数据分析揭示了有关 CSAM 创作者、他们的受害者和他们使用的社交媒体平台的有意义信息。这包括:(1)20 位最活跃的 CSAM 创作者感兴趣的主题和首选社交媒体平台(例如,两位顶级创作者对论坛中的视频、网络摄像头和一般内容感兴趣,他们经常使用 Omegle 和 Skype 等平台);(2)CSAM 创作者通常提到的受害者年龄和国籍信息,例如年龄为 12 岁和 13 岁的受害者,国籍包括英国和俄罗斯;(3)CSAM 创作者用于分享或上传 CSAM 的首选社交媒体平台,包括 Omegle、YouTube、Skype、Instagram 和 Telegram。
我们的 CSAM 检测系统在实时分类 CSAM 和非 CSAM 帖子时具有高精度、高召回率和高准确率的优异性能。此外,它还可以通过采用先进的统计方法提取和可视化有关 CSAM 创作者和受害者的有价值且独特的见解。这些见解对我们的合作伙伴(即国家热线和儿童机构)非常有益。