通过生成概率建模对算法生成的域名进行无监督、低延迟异常检测。

Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling.

机构信息

Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802, USA.

Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802, USA ; Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA.

出版信息

J Adv Res. 2014 Jul;5(4):423-33. doi: 10.1016/j.jare.2014.01.001. Epub 2014 Jan 9.

DOI:10.1016/j.jare.2014.01.001

PMID:25685511

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4294760/

Abstract

We propose a method for detecting anomalous domain names, with focus on algorithmically generated domain names which are frequently associated with malicious activities such as fast flux service networks, particularly for bot networks (or botnets), malware, and phishing. Our method is based on learning a (null hypothesis) probability model based on a large set of domain names that have been white listed by some reliable authority. Since these names are mostly assigned by humans, they are pronounceable, and tend to have a distribution of characters, words, word lengths, and number of words that are typical of some language (mostly English), and often consist of words drawn from a known lexicon. On the other hand, in the present day scenario, algorithmically generated domain names typically have distributions that are quite different from that of human-created domain names. We propose a fully generative model for the probability distribution of benign (white listed) domain names which can be used in an anomaly detection setting for identifying putative algorithmically generated domain names. Unlike other methods, our approach can make detections without considering any additional (latency producing) information sources, often used to detect fast flux activity. Experiments on a publicly available, large data set of domain names associated with fast flux service networks show encouraging results, relative to several baseline methods, with higher detection rates and low false positive rates.

摘要

我们提出了一种检测异常域名的方法，重点是针对算法生成的域名，这些域名通常与恶意活动相关，如快速流转服务网络，特别是针对僵尸网络（或僵尸网络）、恶意软件和网络钓鱼。我们的方法基于学习一个（零假设）概率模型，该模型基于一组已被某些可靠机构列入白名单的大量域名。由于这些名称大多是由人类分配的，因此它们是可发音的，并且往往具有字符、单词、单词长度和单词数量的分布，这些分布是某种语言（主要是英语）的典型特征，并且通常由来自已知词汇的单词组成。另一方面，在当今的场景中，算法生成的域名通常具有与人类创建的域名截然不同的分布。我们提出了一种良性（白名单）域名概率分布的全生成模型，可用于异常检测设置，以识别可能的算法生成的域名。与其他方法不同，我们的方法可以在不考虑任何其他（产生延迟）信息源的情况下进行检测，这些信息源通常用于检测快速流转活动。在一个公开的、与快速流转服务网络相关的大型域名数据集上进行的实验表明，与几种基线方法相比，我们的方法取得了令人鼓舞的结果，具有更高的检测率和低的误报率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6869/4294760/29f134c1fab2/fx1.jpg

相似文献

Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling.

J Adv Res. 2014 Jul;5(4):423-33. doi: 10.1016/j.jare.2014.01.001. Epub 2014 Jan 9.

Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling.

Entropy (Basel). 2020 Sep 22;22(9):1058. doi: 10.3390/e22091058.

UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection.

Data Brief. 2020 Mar 9;30:105400. doi: 10.1016/j.dib.2020.105400. eCollection 2020 Jun.

DNS dataset for malicious domains detection.

Data Brief. 2021 Sep 4;38:107342. doi: 10.1016/j.dib.2021.107342. eCollection 2021 Oct.

Learning Orthographic Structure With Sequential Generative Neural Networks.

Cogn Sci. 2016 Apr;40(3):579-606. doi: 10.1111/cogs.12258. Epub 2015 Jun 14.

Accurate mobile malware detection and classification in the cloud.

Springerplus. 2015 Oct 7;4:583. doi: 10.1186/s40064-015-1356-1. eCollection 2015.

AULD: Large Scale Suspicious DNS Activities Detection via Unsupervised Learning in Advanced Persistent Threats.

Sensors (Basel). 2019 Jul 19;19(14):3180. doi: 10.3390/s19143180.

Fast Flux Watch: A mechanism for online detection of fast flux networks.

J Adv Res. 2014 Jul;5(4):473-9. doi: 10.1016/j.jare.2014.01.002. Epub 2014 Jan 17.

Generation of a large gene/protein lexicon by morphological pattern analysis.

J Bioinform Comput Biol. 2004 Jan;1(4):611-26. doi: 10.1142/s0219720004000399.

Protein names and how to find them.

Int J Med Inform. 2002 Dec 4;67(1-3):49-61. doi: 10.1016/s1386-5056(02)00052-7.

引用本文的文献

Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling.

Entropy (Basel). 2020 Sep 22;22(9):1058. doi: 10.3390/e22091058.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过生成概率建模对算法生成的域名进行无监督、低延迟异常检测。

Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling.

机构信息

Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802, USA.

出版信息

J Adv Res. 2014 Jul;5(4):423-33. doi: 10.1016/j.jare.2014.01.001. Epub 2014 Jan 9.

DOI:10.1016/j.jare.2014.01.001

PMID:25685511

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4294760/

Abstract

摘要

通过生成概率建模对算法生成的域名进行无监督、低延迟异常检测。

Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过生成概率建模对算法生成的域名进行无监督、低延迟异常检测。

Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling.

机构信息

出版信息