An evaluation of hybirdmachine learning classifier models for identification of terrorist groups in the aftermath of an attack
Abstract/ Overview
Terrorist attacks have globally led to loss of life and property, fear, and general insecurity. Terrorist acts are planned and perpetrated by collections of loosely organized people operating in shadowy networks that are difficult to identify. Machine learning classifier algorithms have been used in accurate identification of terrorist groups and weapon types in India, Egypt, Pakistan, and United Kingdom. However, the urgency of responding to a terrorist attack and the subsequent nature of analysis required to identify the terrorist group involved in an attack demands that the performance of the classifiers yield highly accurate outcomes. The concept of combining classifier algorithms into hybrid is proposed as a new way of improving the accuracy. To date there has not been sufficient research that attempts to find combinations of Naïve Bayes, K-Nearest Neighbor, Decision Trees, Support Vector Machines and Multi-Layer Perceptron as base classifier algorithm modelsand resample sample size percent for optimum accuracy in the identification of terrorist groups in the aftermath of an attack. The aim of the study is to build and evaluate hybrid classifier algorithm models for identification of terrorist groups. Specifically, it builds and evaluates base classifier algorithm models, builds, and evaluates hybrid classifier algorithm models by combining and evaluating the base classifier algorithm models,and compares the performance of the classifier algorithm models. The study adoptsa randomized block experimental research design using Waikato Environment for Knowledge Analysis (WEKA) tool for building and evaluating the classifier algorithm models, and 1999-2017 sub-Sahara terrorist dataset from the Global Terrorist Database (GTD). The features country, region, attack type, target type, group name and weapon type are ranked highest of 23 attributes of the dataset for identification of the terrorist group name the using WEK Afilter-based search and ranker routine. Data imbalance in the dataset is addressed by varying resample sample size percent for optimum performance. The classifier algorithm models were evaluated and compared on accuracy and build time as performance metrics using 10-fold cross validation, test split and ANOVA test. The results suggest that hybrid classifier algorithm models yield higher accuracy rates, accuracy rates for 10-fold cross validation are higher than the rates for test split and that resample sample size percent as a technique to solve class imbalance affects accuracy and yields optimum accuracy rates at resample sample size percent of 1000 for the available dataset. The results show a significant improvement in accuracy between the control group and the experimental group.The study concludes that hybrid KD (a combination of K-Nearest Neighbor and Decision trees) outperformed all other classifier algorithm models at resample sample size percent of 1000 with an accuracy rate of 88.18% and build time of 0.03 seconds for 10- fold cross validation and accuracy rate of 87.66% and build time of 1.03 seconds for test split in the identification of terrorist groups in the aftermath of an attack for the sub-Sahara Africa dataset.The study makes contribution by developing a systematic process of building a hybrid classifier algorithm model and establishing a resample sample size percent of 1000 for optimum accuracy rates for the dataset.