dc.description.abstract | ABSTRACT
The use of cloud computing has been growing exponentially since its inception. Availability of the cloud, however, has been a problem for users and Cloud Service Providers (CSPs) alike; outages have been on the rise. This problem could be attributed to the fact that engineers building Availability Mechanisms (AMs) and those studying outage causes do not work collectively. The general objective of the study was to develop and evaluate an availability mechanism model for service outages in cloud computing environments. The study specifically sought to: identify the causes of outages in cloud computing environments, identify availability mechanisms in use in cloud computing environments, formulate a model that establishes correspondences between AMs and outages, and evaluate the performance of the model by measuring its service availability levels in cloud computing environments in relation to the settings of the cloud computing system parameters. A model was developed called the Ferris Wheel of Availability (FWA) model. The model was developed by relating AMs to outage causes, with AMs being conjugate in nature in relation to the respective outage causes. There were seven categories of AMs and seven categories of outage causes; AMs were categorized as cluster management, component redundancy, limit detection policy, checkpointing, node management, Active-X variant and fault tolerance. Outage causes were categorized as configuration issues, hardware issues, resource exhaustion, security issues, node failures, network issues and natural disasters. Testing of the model was done using CloudSim, a discrete, deterministic simulator that allows users to set up their customized configurations and run them in it. The simulator was configured to run each outage cause individually and the applicable AMs were then injected simultaneously and output recorded. Each outage cause had two AMs, and the findings confirmed the effectiveness of the proposed model structure in increasing service availability at infrastructure level. Key findings were that checkpointing is not effective as an AM against resource exhaustion, and that effective management of a cluster results in effective management of the nodes in it. It was not conclusive as to whether limit detection policy was effective as an AM against security issues. The study also suggested that limit detection policy be renamed limit prevention policy. The study introduced a new availability parameter called execution availability and recommended its use together with service availability in predicting overall availability at infrastructure level. The key contributions of the study were: development of the FWA model that establishes correspondences between AMs and outage causes since a model that establishes these correspondences had not been developed before; discovery of the relationships between AMs and outage causes based on simulation tests and consequent analysis; and introduction of an availability parameter called execution availability that measures the ratio of tasks allocated versus tasks executed. It is recommended to study the feasibility of merging two or more simulators to achieve results which were inconclusive using one simulator; an extension to the simulator in use may also be investigated. The use of the FWA model at CSP level is also recommended as it assists analysts and developers to build for availability from the very foundation as opposed to adapting a wait-and-see attitude in countering outages as they occur. The outcome of the study points to suggest that the application of the FWA model in a cloud computing infrastructure has the potential to increase availability in the cloud. | en_US |