• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FastEmbed:基于集成机器学习算法的漏洞利用可能性预测。

FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.

机构信息

College of Cybersecurity Sichuan University, Chengdu, Sichuan, P.R.China.

出版信息

PLoS One. 2020 Feb 6;15(2):e0228439. doi: 10.1371/journal.pone.0228439. eCollection 2020.

DOI:10.1371/journal.pone.0228439
PMID:32027693
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7004314/
Abstract

In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick exclusion of the non-exploitable vulnerabilities and optimal patch prioritization on limited resources has become imperative for organizations. Recent works using machine learning techniques predict exploited vulnerabilities by extracting features from open-source intelligence (OSINT). However, in the face of explosive growth of vulnerability information, there is room for improvement in the application of past methods to multiple threat intelligence. A more general method is needed to deal with various threat intelligence sources. Moreover, in previous methods, traditional text processing methods were used to deal with vulnerability related descriptions, which only grasped the static statistical characteristics but ignored the context and the meaning of the words of the text. To address these challenges, we propose an exploit prediction model, which is based on a combination of fastText and LightGBM algorithm and called fastEmbed. We replicate key portions of the state-of-the-art work of exploit prediction and use them as benchmark models. Our model outperforms the baseline model whether in terms of the generalization ability or the prediction ability without temporal intermixing with an average overall improvement of 6.283% by learning the embedding of vulnerability-related text on extremely imbalanced data sets. Besides, in terms of predicting the exploits in the wild, our model also outperforms the baseline model with an F1 measure of 0.586 on the minority class (33.577% improvement over the work using features from darkweb/deepweb). The results demonstrate that the model can improve the ability to describe the exploitability of vulnerabilities and predict exploits in the wild effectively.

摘要

近年来,已发现和公开披露的漏洞数量呈急剧上升趋势。然而,考虑到只有一小部分漏洞被利用,攻击者对漏洞的利用价值也各不相同。因此,对于组织来说,实现快速排除不可利用的漏洞,并在有限的资源上对漏洞进行最优补丁优先级排序已经变得势在必行。最近使用机器学习技术的研究工作通过从开源情报(OSINT)中提取特征来预测被利用的漏洞。然而,面对漏洞信息的爆炸式增长,过去的方法在应用于多种威胁情报方面还有改进的空间。需要一种更通用的方法来处理各种威胁情报源。此外,在过去的方法中,传统的文本处理方法被用于处理与漏洞相关的描述,这些方法仅抓住了静态统计特征,但忽略了文本的上下文和词语的含义。为了解决这些挑战,我们提出了一种利用快速Text 和 LightGBM 算法相结合的漏洞利用预测模型,称为 fastEmbed。我们复制了漏洞利用预测的最新研究工作的关键部分,并将其用作基准模型。我们的模型在不与时间混合的情况下,无论是在泛化能力还是预测能力方面都优于基线模型,在极不平衡的数据集上通过学习与漏洞相关的文本的嵌入,平均总体提高了 6.283%。此外,在预测野外漏洞利用方面,我们的模型在少数类上的 F1 指标为 0.586,也优于基线模型(比使用暗网/深网特征的工作提高了 33.577%)。结果表明,该模型可以有效提高描述漏洞可利用性和预测野外漏洞利用的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/311e346de00a/pone.0228439.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/240e8d8d73fc/pone.0228439.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/f2562c86b2f4/pone.0228439.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/9ebc6c700e82/pone.0228439.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/2b55e9776999/pone.0228439.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/2d79e0a33b6f/pone.0228439.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/fcd497b47e72/pone.0228439.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/fd746ab7605f/pone.0228439.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/70ad33faae74/pone.0228439.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/7e21b11c3eb6/pone.0228439.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/87c3e04f9c8b/pone.0228439.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/d89058092991/pone.0228439.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/71ff59ff89d0/pone.0228439.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/5bea6b3aee4a/pone.0228439.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/311e346de00a/pone.0228439.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/240e8d8d73fc/pone.0228439.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/f2562c86b2f4/pone.0228439.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/9ebc6c700e82/pone.0228439.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/2b55e9776999/pone.0228439.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/2d79e0a33b6f/pone.0228439.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/fcd497b47e72/pone.0228439.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/fd746ab7605f/pone.0228439.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/70ad33faae74/pone.0228439.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/7e21b11c3eb6/pone.0228439.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/87c3e04f9c8b/pone.0228439.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/d89058092991/pone.0228439.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/71ff59ff89d0/pone.0228439.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/5bea6b3aee4a/pone.0228439.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b682/7004314/311e346de00a/pone.0228439.g014.jpg

相似文献

1
FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.FastEmbed:基于集成机器学习算法的漏洞利用可能性预测。
PLoS One. 2020 Feb 6;15(2):e0228439. doi: 10.1371/journal.pone.0228439. eCollection 2020.
2
An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding.具有新型代价函数和自定义训练词向量嵌入的改进型漏洞利用预测模型。
Sensors (Basel). 2021 Jun 20;21(12):4220. doi: 10.3390/s21124220.
3
Vulnerability extraction and prediction method based on improved information gain algorithm.基于改进信息增益算法的漏洞提取与预测方法。
PLoS One. 2024 Sep 10;19(9):e0309809. doi: 10.1371/journal.pone.0309809. eCollection 2024.
4
Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach.零日恶意软件检测与 Shapley 集成提升和装袋方法在恶意软件分析中的有效应用。
Sensors (Basel). 2022 Apr 6;22(7):2798. doi: 10.3390/s22072798.
5
Enhancing the security of patients' portals and websites by detecting malicious web crawlers using machine learning techniques.利用机器学习技术检测恶意网络爬虫,增强患者门户和网站的安全性。
Int J Med Inform. 2019 Dec;132:103976. doi: 10.1016/j.ijmedinf.2019.103976. Epub 2019 Sep 25.
6
A systematic review of fuzzing based on machine learning techniques.基于机器学习技术的模糊测试系统综述。
PLoS One. 2020 Aug 18;15(8):e0237749. doi: 10.1371/journal.pone.0237749. eCollection 2020.
7
Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning.利用机器学习对已去除保护健康信息的大型国家体力活动数据集进行重新识别个体的可行性。
JAMA Netw Open. 2018 Dec 7;1(8):e186040. doi: 10.1001/jamanetworkopen.2018.6040.
8
A detection method for android application security based on TF-IDF and machine learning.基于 TF-IDF 和机器学习的安卓应用安全检测方法。
PLoS One. 2020 Sep 11;15(9):e0238694. doi: 10.1371/journal.pone.0238694. eCollection 2020.
9
Examining the Capacity of Text Mining and Software Metrics in Vulnerability Prediction.考察文本挖掘和软件度量在漏洞预测中的能力。
Entropy (Basel). 2022 May 5;24(5):651. doi: 10.3390/e24050651.
10
An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.一种用于在社交媒体消息中发现健康相关知识的集成异构分类方法。
J Biomed Inform. 2014 Jun;49:255-68. doi: 10.1016/j.jbi.2014.03.005. Epub 2014 Mar 16.

引用本文的文献

1
EBMGP: a deep learning model for genomic prediction based on Elastic Net feature selection and bidirectional encoder representations from transformer's embedding and multi-head attention pooling.EBMGP:一种基于弹性网络特征选择以及来自Transformer嵌入和多头注意力池化的双向编码器表示的基因组预测深度学习模型。
Theor Appl Genet. 2025 Apr 19;138(5):103. doi: 10.1007/s00122-025-04894-z.
2
Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification.组合浅层还是集成深层?迈向有向汉字分类的轻量级解决方案。
PLoS One. 2023 Jul 28;18(7):e0289204. doi: 10.1371/journal.pone.0289204. eCollection 2023.

本文引用的文献

1
WannaCry, Cybersecurity and Health Information Technology: A Time to Act.想哭病毒、网络安全与健康信息技术:是时候采取行动了。
J Med Syst. 2017 Jul;41(7):104. doi: 10.1007/s10916-017-0752-1.