Forecasting GDP Growth: Application of Autoregressive Integrated Moving Average Model

This paper uses Box-Jenkins approach to model and forecast real GDP growth in Ethiopia. Such an approach could easily provide forecast for key macroeconomic variables in limited data environment. Based on the approach, the paper estimates Autoregressive Integrated Moving Average ARIMA (1,1,1) model and forecasts real GDP growth. Both the in-sample fit and pseudo-out of sample forecasts show that the ARIMA model’s performance are good and better than other forecasts.


Introduction
Economic forecasting is a common practice in economics.This is often done either using univariate time series or multivariate economic models.The univariate time series models are single variable based models that are simple and have less data requirements, while the multivariate (time series or large aggregate economic) models are relatively complex and are based on several economic assumptions.Macroeconometric models are largely guided by economic theory that covers major economic sectors, activities and policies in an economy.They are formulated in a theoretically consistent manner, satisfying economic identities for use in both forecasts and policy analysis.Nevertheless, they are data intensive and time consuming.Several countries in Africa are constrained by timely availability of longer series of data of major economic indicators.Thus, developing large macroeconomic models could be challenging so, in such cases, forecasting could be done using univariate time series models.
There are several empirical studies available that compare the forecast performance of time series models.In the US, Stock and Watson (1998) reported that linear univariate autoregressions and vector auto regressive (VAR) models perform well than nonlinear models in a wide range of US macroeconomic series.Eitrheim, Husebo, and Nymoen (1999) found that first difference VAR model produces more accurate forecasts than large macro model used by the central bank of Norway.Besides, Banerjee, and Marcellino (2006) indicated that univariate models are more robust than multivariate models.Edge, Kiley, and Laforte (2010) also reported that simple time series models such as VAR produce forecasts that outperform forecasts from large macro models.
Importantly, there is a trade-off between precision with which one can estimate parameters and the complexity of a model (Robertson & Tallman, 1999) and often macroeconomic data are available for short sample periods; hence, simple univariate or VAR models could be superior in forecasting than the large macro models.Therefore, in a data scarce environment, univariate time series models can be used in lieu of large macroeconomic models for short term forecasting purposes.
Though not institutionalized and some are dated, there are a few macro models developed for the Ethiopian economy (See the review in Geda & Zerfu, 2004).However, the use of univariate time series models for forecasting seems to be missing.Moreover, the availability of long series of macroeconomic data is scarce for Ethiopia.The main aim of this study is to show the use of univariate time series model for forecasting in countries with limited data environment.Such approach is easier and could easily provide forecast for key macroeconomic variables such as GDP and inflation.The study uses real GDP data for Ethiopia covering the periods 1980-2014 drawn from World Development Indicators database (World Bank, 2015).GDP is a key aggregate indicator of the economic performance.It reveals the final value of all goods and services produced in an economy over a given period.Policy makers (monetary as well as fiscal) require forecasts to get an insight about the future trend of the economy and to respond timely.This paper follows the Box and Jenkins (1976) approach to fit a univariate model that can be used to forecast real GDP growth.Since the real GDP series is expected to be non-stationary, the paper takes the first difference of the series and inspects its autocorrelation and partial autocorrelations to identify the values of AR and MA terms.Based on a combination of statistical significance of the estimated coefficients and goodness fit of the model based on Mean Square Error (MSE) and Akike Information Criteria (AIC), the study estimates an ARIMA (1,1,1) model to forecast real GDP growth in Ethiopia.Then, the paper assesses the forecast accuracy of the model using in-sample and pseudo-out of sample forecasts.According to the results the model performs well with in-sample forecast of Root Mean Square Error (RMSE)=0.063and with pseudo-out of sample forecast (RMSE)=0.011,often the forecast undershoot actual realizations, comparing the model forecast with other forecasters (the IMF's world economic outlook and the World Bank's global economic prospects), the univariate model outperforms these forecasters given the low forecast errors.Hence, in data scarce environment, countries could use the available time series data and fit univariate models to produce short-term forecasts to get a highlight of their economy in the future.
The paper is organized as follows.Section 2 discusses the Box-Jenkins methodology.Section 3 presents the econometric results and discusses the findings.The last section concludes and suggests further research directions.

Methodology
This study follows the Box and Jenkins (1976) methodology to develop a univariate time series forecasting model, often referred as Autoregressive Integrated Moving Average (ARIMA).The Box-Jenkins approach is based on Wold representation theorem that states every stationary time series has an infinite moving average (MA) representation.This means the future developments of the series can be expressed as a function of its past developments.The approach involves four stage iterative procedure (identification, estimation, diagnostic checking and forecasting) in developing a preferred model for forecasting.
The general ARIMA (p,d,q) model for a   series integrated of order 1 (d=1) is given as in equation ( 1).Where p is the AR term, d is the order of integration and q is the MA term.

Model Identification
In the Box-Jenkins approach, the first stage is to examine the data and identify whether the series is stationary or not.That is testing for stationarity of the series using unit root tests (such as Augmented Dickey Fuller (ADF) and Phillips-Perron (PP) tests).Then, check for the appropriate AR(p) and MA(q) terms that should be included in the model.The paper uses the ACF and PACF to decide on the appropriate AR and MA terms.ACF is the correlation between   and  − , while the PACF measures the partial correlation between   and  − accounting for the intermediate lags in between.The ACF that truncates at lag q suggest a MA(q), while the PACF that truncates at lag p suggest an AR(p).

Model Estimation
The second stage is estimating a class of ARIMA (p, d, q) models using maximum likelihood estimation and obtains the estimates of the coefficients of AR and MA terms.Using a combination of statistical significance of the estimated parameters, the overall model and Akike information Criteria (AIC) and Bayesian Information Criteria (BIC); the paper selects a preferred ARIMA model.AR is a model that expresses a variable in terms of its past values, while MA expresses the variable using its past errors.A series could be modeled using a combination of AR and MA.

Diagnostic Testing
The third stage is to diagnose the class of ARIMA models for adequacy.The study checks whether residuals of the model are white noise, not serially correlated and normally distributed.Specifically, the study uses the Portmanteau (Q) test for white noise, the ACF and PCF for checking the residuals serial correlation and the Jarqua-Bera normality test for checking the normality of the residuals.

Forecasting
Using the preferred model, this paper forecasts real GDP growth both in-sample and pseudo out of sample.The pseudo out of sample forecast is used, since the in-sample fit of the model could not well inform on the model's forecast performance for future values out of the sample (Robertson & Tallman, 1999).Then, the paper assess the forecast accuracy of the model using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).In-sample forecasting shows how the model fits the data in a given sample; while the pseudo out of sample forecast shows how the model forecasts for future values which are out of the sample.Further, the study compares the forecast accuracy of the preferred model with other forecasts done for Ethiopia.

Econometric Results and Discussion
The study has used real GDP data from the WDI covering the years 1981-2014.Figure 1 plots the trend of log real GDP and its first difference (GDP growth).Over most of the sample period, real GDP shows a non-linear trend of growth perhaps implying the nonconstant mean and variance of the series.There were some periods of decline in real GDP (for instance in 1984/85 due to extreme drought that affected the country; and 1991/92 due to aftermath of protracted civil war and beginning of transition).The growth rates oscillate between negative and positive values, though for most of the period are positive.Importantly, in the later periods (since 2004) the country registered impressive growth record averaging 11 percent per annum.As discussed in the methodology section, the Box-Jenkins approach follows the iterative procedure of model identification, estimation, diagnostic checking and forecasting.The following subsections discuss the results of each stage.

Model Identification
The paper tests the stationarity of the real GDP series using ADF and PP tests.Table 1 shows that the log real GDP series (lrgdp) in levels is non-stationary under both Aaugmented Dickey Fuller (ADF) and Phillips Perron (PP) tests.
The log of GDP is taken to linearize the variable real GDP and improve the nature of the distribution into normality.However, the series becomes stationary in first difference (dlrgdp), since the null hypothesis of unit root is rejected at 1 percent level of significance in both ADF and PP tests as shown in Table 1.The results are consistent both under alternate specification (constant and constant with trend) as well as different lag lengths.Therefore, the paper uses the first difference in log real GDP in the ARIMA model.In order to determine the ARMA (p,q) model, the study has used the correlogram of autocorrelation function (ACF) and partial autocorrelation function (PACF) for dlrgdp.The value of p in AR(p) is determined by looking at the PACF that truncates at lag p; while the value of q in MA(q) is determined by considering ACF that truncates at lag q.The Ljung-Box Q-statistics tests the randomness of the series at a particular lag.Table A1 at the appendix suggests that the value of p and q could be set to 1 at 10 percent level of significance, while Figures 2a-2b show that the values of p and q could be set to zero, respectively.However, the spikes in PACF at the third lag could affect the estimation results.Given the limited number of observations and the yearly data at hand, the ARMA (p, q) could be set in to p=1 and q=1.Hence, the paper compares different combinations of ARIMA (p, 1, q) and selects a preferred model based on the information criteria (Akaike Information Criteria and Bayesian Information Criteria) and MSEs of each model.Table 2 shows that an ARIMA (1,1,1) is the preferred model given the low AIC and MSE.

Model Estimation
Table 3 presents the estimates of ARIMA (1,1,1) model, the preferred model.The AR (1) coefficient is statistically significant at 5 percent, while the MA (1) term is insignificant.Overall, the model is statistically significant (Wald Chi 2 (2) =11.09, p-value = 0.003) with good model fit (MSE = 0.004).Though the series dlrgdp is stationary, the estimated coefficient of AR ( 1) is large perhaps due to the spike observed in the third lag of Figure 2b.

Diagnostic Checking
The paper diagnoses the estimated model for statistical significance and acceptability.First, the paper checks the stability of the ARIMA model using the inverse roots for AR and MA characteristics polynomials in Figure A1.The AR and MA roots, respectively are 0.9 and 0.77 that lie inside the unite circle implying stationarity and invertibility.Hence, the ARIMA (1,1,1) model is stable.Second, the paper tests for the randomness (white noise), normality and autocorrelation of the residual.The Portmanteau Qstatistics test for white noise could not reject the null hypothesis of white noise residuals (Q-statistic = 12.67, p-value = 0.55).Further, the paper checks the normality of the residuals using the Jarque-Bera test and could not reject the normality of the residuals (Adjusted Chisquared = 2.47,.The paper also tests the autocorrelation of the residuals using Ljung-Box Q-statistics and provide the ACF and PACF graphs (see Figure 3a and 3b).The Ljung-Box Q-statistics in Table A2 at the appendix show that the null hypothesis of no autocorrelation of residuals are not rejected.Similarly, both ACF and PACF also show no autocorrelation of residuals.Overall, the residuals are white noise, normal (according to Figure A2) and serially non-autocorrelated.Hence, the diagnostic checks reveal that the ARIMA (1,1,1) model is statistically acceptable.

Forecasting
Based on ARIMA (1,1,1) model, the paper forecasts real GDP growth both in-sample and pseudo-out of sample.First, the paper estimates the ARIMA (1,1,1) model using data for 1981-2014 and get static forecast for the whole sample period.Second, the paper forecast out of sample for the period 2015-2017.Figure 4 shows the actual and static forecast for the sample period 1981-2014.Based on both in-sample and pseudo-out of sample forecast, the study assesses the accuracy of the forecasts.Table 4 shows the MAE and RMSE.The forecast of the model is good with small forecast errors.Importantly, the pseudo-out of sample forecast errors are even smaller suggesting the good performance of the ARIMA (1,1,1) model.

Conclusion
This paper aims to show the use of univariate time series model for forecasting in countries with limited data environment.Such approach is easier and could easily provide forecast for key macroeconomic variables such as GDP and inflation.The paper uses real GDP data for Ethiopia covering the periods 1980-2014 drawn from World Development Indicators database (World Bank, 2015).
The paper follows the Box and Jenkins (1976) approach to fit a univariate model that can be used to forecast GDP growth.Since the real GDP series is non-stationary, the paper takes the first difference of the series and inspect its autocorrelation and partial autocorrelations to identify the values of AR and MA terms.Based on a combination of statistical significance of the estimated coefficients and goodness fit of the model based on AIC and MSE, the paper estimates an ARIMA (1,1,1) model to forecast real GDP growth in Ethiopia.Then, it assesses the forecast accuracy of the model using in-sample and pseudo-out of sample forecasts.The preferred model performs well with in-sample forecast of RMSE=0.063 and pseudo-out of sample forecast of RMSE=0.011.Comparing the model forecast with other forecasters, the univariate model outperforms these forecasters given the low forecast errors.
Hence, in data scarce environment, countries could use the available time series data and fit univariate models to produce shortterm forecasts to get a preview of their economy in the future.For further improving the modelling and forecasting of the GDP growth, the paper suggests further studies to investigate VAR models and compare them with univariate models and other forecasters.

Figure 1 :
Figure 1: Time Series Plot of Log Real GDP and its First Difference

Figure 4 :
Figure 4: Actual and Static Forecast of Real GDP Growth