Saxena Deepika, Singh Ashutosh Kumar
Department of Computer Applications, National Institute of Technology, Kurukshetra, India.
J Supercomput. 2022;78(6):8003-8024. doi: 10.1007/s11227-021-04235-z. Epub 2022 Jan 6.
The indispensable collaboration of cloud computing in every digital service has raised its resource usage exponentially. The ever-growing demand of cloud resources evades service availability leading to critical challenges such as cloud outages, SLA violation, and excessive power consumption. Previous approaches have addressed this problem by utilizing multiple cloud platforms or running multiple replicas of a Virtual Machine (VM) resulting into high operational cost. This paper has addressed this alarming problem from a different perspective by proposing a novel nline virtual machine ailure rediction and olerance odel (OFP-TM) with high availability awareness embedded in physical machines as well as virtual machines. The failure-prone VMs are estimated in real-time based on their future resource usage by developing an ensemble approach-based resource predictor. These VMs are assigned to a failure tolerance unit comprising of a resource provision matrix and Selection Box (S-Box) mechanism which triggers the migration of failure-prone VMs and handle any outage beforehand while maintaining the desired level of availability for cloud users. The proposed model is evaluated and compared against existing related approaches by simulating cloud environment and executing several experiments using a real-world workload Google Cluster dataset. Consequently, it has been concluded that OFP-TM improves availability and scales down the number of live VM migrations up to 33.5% and 83.3%, respectively, over without OFP-TM.
云计算在每项数字服务中不可或缺的协作极大地提高了其资源使用率。对云资源不断增长的需求规避了服务可用性,导致诸如云中断、违反服务水平协议(SLA)以及过度功耗等关键挑战。先前的方法通过利用多个云平台或运行虚拟机(VM)的多个副本解决了这个问题,但导致运营成本高昂。本文从不同角度解决了这个令人担忧的问题,提出了一种新颖的在线虚拟机故障预测与容错模型(OFP - TM),该模型在物理机以及虚拟机中都嵌入了高可用性意识。通过开发一种基于集成方法的资源预测器,根据易故障虚拟机未来的资源使用情况实时估计这些虚拟机。这些虚拟机被分配到一个容错单元,该单元由一个资源供应矩阵和选择框(S - Box)机制组成,该机制触发易故障虚拟机的迁移并预先处理任何中断,同时为云用户维持所需的可用性水平。通过模拟云环境并使用真实世界工作负载谷歌集群数据集执行多个实验,对所提出的模型进行了评估并与现有相关方法进行了比较。结果得出结论,与没有OFP - TM的情况相比,OFP - TM分别将可用性提高了,并且将实时虚拟机迁移的数量分别减少了33.5%和83.3%。