Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
Department of Computer Science, Faculty of Engineering, Payame Noor University (PNU), Iran.
Int J Med Inform. 2019 Dec;132:103976. doi: 10.1016/j.ijmedinf.2019.103976. Epub 2019 Sep 25.
There is increasing demand for access to medical information via patients' portals. However, one of the challenges towards widespread utilisation of such service is maintaining the security of those portals. Recent reports show an alarming increase in cyber-attacks using crawlers. These software programs crawl web pages and are capable of executing various commands such as attacking web servers, cracking passwords, harvesting users' personal information, and testing the vulnerability of servers. The aim of this research is to develop a new effective model for detecting malicious crawlers based on their navigational behavior using machine-learning techniques.
In this research, different methods of crawler detection were investigated. Log files of a sample of compromised web sites were analysed and the best features for the detection of crawlers were extracted. Then after testing and comparing several machine learning algorithms including Support Vector Machine (SVM), Bayesian Network and Decision Tree, the best model was developed using the most appropriate features and its accuracy was evaluated.
Our analysis showed the SVM-based models can yield higher accuracy (f-measure = 0.97) comparing to Bayesian Network (f-measure = 0.88) and Decision Tree (f-measure = 0.95) and artificial neural network (ANN) (f-measure = 0.87)for detecting malicious crawlers. However, extracting proper features can increase the performance of the SVM (f-measure = 0.98), the Bayesian network (f-measure = 0.94) and the Decision Tree (f-measure = 0.96) and ANN (f-measure = 0.92).
Security concerns are among the potential barriers to widespread utilisation of patient portals. Machine learning algorithms can be accurately used to detect malicious crawlers and enhance the security of sensitive patients' information. Selecting appropriate features for the development of these algorithms can remarkably increase their accuracy.
人们对通过患者门户获取医学信息的需求日益增加。然而,广泛使用此类服务的挑战之一是维护这些门户的安全性。最近的报告显示,使用爬虫程序的网络攻击数量令人震惊地增加。这些软件程序可以爬取网页,并能够执行各种命令,如攻击 Web 服务器、破解密码、收集用户个人信息以及测试服务器的漏洞。本研究的目的是开发一种新的基于机器学习技术的有效模型,用于检测恶意爬虫程序基于其导航行为。
本研究调查了不同的爬虫检测方法。分析了一组受影响网站的日志文件,并提取了用于检测爬虫程序的最佳特征。然后,在测试和比较了包括支持向量机 (SVM)、贝叶斯网络和决策树在内的几种机器学习算法之后,使用最合适的特征开发了最佳模型,并评估了其准确性。
我们的分析表明,与贝叶斯网络 (f-measure=0.88) 和决策树 (f-measure=0.95) 以及人工神经网络 (ANN) (f-measure=0.87) 相比,基于 SVM 的模型在检测恶意爬虫程序时可以产生更高的准确性 (f-measure=0.97)。然而,提取适当的特征可以提高 SVM 的性能 (f-measure=0.98)、贝叶斯网络 (f-measure=0.94)、决策树 (f-measure=0.96) 和 ANN (f-measure=0.92)。
安全问题是广泛使用患者门户的潜在障碍之一。机器学习算法可用于准确检测恶意爬虫程序,增强敏感患者信息的安全性。为开发这些算法选择适当的特征可以显著提高其准确性。