Model selection in generalized estimating Equations based on kullback’s i-divergence
Abstract/ Overview
The method of Generalized Estimating Equations (GEE) is often used in analyzing correlated longitudinal data and does provide consistent estimates which are robust to misspecification of the working correlation structure. However, the estimates suffer loss of efficiency if the correlation structure is not close to the true one hence the models selected may not be generalizable, good-fit and parsimonious. The Quasi-likelihood information criterion (QIC) which results from utilizing Kullback’s I-divergence as the targeted discrepancy is widely used in the GEE framework to select the best correlation structure and the best subset of predictors. However, it has been established to have success rates of less than 50% hence higher chances of selecting a misspecified structure. Use of a mis-specified structure results in efficiency loss in the GEE estimator of up to 40% compared to when the correct correlation structure is used. Also, the independence structure favored by QIC, results in efficiency loss of up to 60% in the GEE estimates. Through numerical simulations, the study sought to investigate the properties of QIC in selecting the true working correlation structure and set of covariates for the mean structure in GEEs, develop a hybrid methodology based on Empirical likelihood Akaike Information Criteria (EAIC) and QIC for model selection in the GEE framework and apply the proposed hybrid methodology to the Shareholder Value Creation data. With regard to consistency in selecting the true correlation structure, we established having a selection set of only parsimonious structures and penalizing for the number of correlation and regression parameters estimate to be sufficient conditions for QIC to select the true structure with a probability approaching one as n → ∞. In relation to the selection of covariates, we established that QIC had high sensitivity but low sparsity. The type I error rate converged to 0.3 as n → ∞ while the type II error rates quickly diminished to zero as n → ∞. The low under-fitting probabilities meant high statistical power hence rejecting any given false null hypothesis is essentially guaranteed for sufficiently large n even if the effect size is small. We further established that the hybrid methodology (EQAIC) resulted in models with lower MSE compared to models selected by QIC only. When applied to shareholder value creation data, we established an AR-1 correlation structure for the data with ρ = 0.775 and the key drivers to shareholder value creation ranked based on their relative importance were the growth rate of earnings, economic spread, firm size, leverage, dividend policy and level of financial distress. This justified the tendency of QIC to over-fit models since a more complex model compared to the Gordon Constant Growth model was preferred. However, the use of an AR-1 correlation structure selected by EAIC resulted to a model with lower MSE than the model selected by using QIC only. Based on the study findings we conclude that correctly specifying a working correlation structure improves efficiency of GEE estimates.