A strong assumption of time series regression, a widely used technique in econometrics, is the stationarity. It requires that the variables entered in the regression have their variances (standard deviations), covariances (auto-correlations), and means, that are independent of time. A stationary series must not wander too far from its mean. In most cases, the assumption is violated (non-stationarity, i.e., random walk) and doing such regression involves what is called a spurious regression. Possible solutions for dealing with this problem is through transformation of the variables.
As presented in Investopedia, there exists different types of non-stationarity. The random walk (with or without drift) must be transformed into a stationary process by differencing (subtracting Yt-1 from Yt, i.e., Yt minus Yt-1) correspondingly to Yt-Yt-1=εt or Yt-Yt-1=α+εt and then the process becomes difference-stationary. If the data has a deterministic trend, detrending is needed. In the case of a random walk with a drift (a slow steady change) and deterministic trend, detrending can remove the deterministic trend and the drift, but the variance will continue to go to infinity. As a result, differencing must also be applied to remove the stochastic trend. The disadvantage of differencing is that the process loses one observation each time the difference is taken. If a series must be differenced once (or twice) before it becomes stationary, then it is said to be integrated of order one, I(1), or two, I(2), and it must have one (or two) unit root(s) (random walk). A stationary series without a trend is said to be integrated of order 0. For detrending we use Yt=α+βt+εt is transformed into a stationary process by subtracting the trend βt, i.e., Yt-βt=α+εt. No observation is lost when detrending is used to transform a non-stationary process to a stationary one. When the level of the series is not stable in time, i.e., increasing or decreasing trends, we say that the series is not stable in the mean. If it was variability or autocorrelation, we say the series are not stationary in the variance or autocovariance. If the distribution of the variable at each point in time varied over time, we say that the series is not stationary in distribution. On the other hand, the stationary process reverts around a constant long-term mean and has a constant variance independent of time.
The test of unit root is conducted, usually the Dickey-Fuller. But auto-correlation (error term at time t depending on previous error term at time t-1) may pose problems. The Augmented Dickey-Fuller (parametric) and Phillips-Perron (non-parametric) are the recommended methods in that situation. ADF and PP regressions attempt to control for serial correlations by including the lagged values of the differenced variable (also called lagged difference terms). In ADF, the number of lags is to be specified by the researcher. To determine the appropriate number of lags, model fit indices can be used, e.g., Akaike’s information criterion (AIC). The null hypothesis is that the series is that the variable is not stationary (have unit roots). It is rejected by examining the p-value, which may be problematic to the extent that it becomes smaller the larger the number of observations. When a given variable is not stationary, it must be transformed and used in unit root tests. If the t-test is significant, it is ready to be used in the fixed effect regression.
It becomes possible, given stationarity, to use a Granger causality (multiple regression) test. A variable (X) is said to Granger cause another variable (Y) if Y can be better predicted by the lagged values of both X and Y than by the lagged values of Y alone. In other words, the test evaluates whether or not the lagged values of one variable improve the forecasts of another variable. But it would lead to incorrect inferences about causality when there is an error correction process, and an ECM approach would probably be recommended. A related kind of analysis is the autoregressive model (process). This is a regression model for time series in which the series is explained by its past values rather than by other variables. Such a model specifies that the output variable depends linearly on its own previous values.
Given variables I(1), performing regression in differenced variables removes any long-term information carried by the levels of the variables, so that only inferences about changes is possible. ECM circumvents this problem. By finding cointegration between the variables, ECM can be conducted. Best (2008) says that the low power of unit root tests can lead us to conclude our data are integrated when they are not. When all variables are I(1), we must run the Johansen Test of cointegration before running ECM. If they are not cointegrated, ECM is obviously not appropriate.
An illustration of ECM is provided by Murray (1994) and Smith & Harrison (1995). They use the story of a drunk and her dog. The characters have been chosen because they are supposed to wander aimlessly and thus their paths are more or less unpredictable. In the case of non-stationarity among these variables, the more the time has passed, and the more likely the dog and the drunk will be far away from their previous location (where they have been seen the last time, e.g., in the bar). In the case of stationarity among these variables, we make just an additional assumption : that the dog belongs to the drunk, and consequently he adjusts the distance between his current position and his previous position (Yt-Yt-1) in proportion α to his distance (Yt-1-Xt-1) from his mistress (e.g., whenever she calls him). The expression (Yt-1-Xt-1) captures the cointegrating, or long run, relationship of the dog with his mistress. Here, α corresponds to the speed of adjustment parameter and may take on values between 0 and 1. The more the dog is willing to stay close to his mistress, the more α will be closer to unity. To the extent that the dog reduces the distance (i.e., the error) whenever he goes far away from his mistress, there is presence of error-correction mechanism because the distance between the two will not increase over time. It’s as if there was an equilibrating process that forces them to trend together. Their individual paths are still non-stationary because, as time goes on, they would have been more likely to wander far from their previous location, but the distance between the two paths is stationary. The walks of the dog and drunk are said to be cointegrated of order zero. As an illustration of this context, Smith & Harrison (1995, eqs. 9 & 10) test of ECM shows that the dog is attracted to the mistress but the reverse was not true.
Put it in more complex terms, the variables are non-stationary but their relationship is stationary (their linear combination is stationary). A stochastic (i.e., random) process is said to be integrated of order d, I(d), if the first difference operator, Δ, needs to be applied d times in order to achieve stationarity. For example, with I(2), a series contains 2 unit roots and thus require differencing (i.e., lagging) twice to induce stationarity. More precisely, differencing of order 1 means replacing Yt by Δyt which is also equivalent to Yt-Yt-1. An I(0) series should cross the mean frequently, whereas I(1) and I(2) series can wander a long way from their mean value and cross this mean value rarely. Non-stationarity can be expressed as Yt=Yt-1+ut or Δyt=ut. When d=0, the series yt is stationary in levels of its values, and when d=1 it is the change in the levels from one time period to the next that is stationary.
In light of what is said, Murray (1994) explains what happens if we attempt to regress a stationary variable against a nonstationary variable : the observed association will tend to zero as the variation in the stationary variable grows ever smaller in relation to the variation in the nonstationary variable. This is why time series regressions depend so strongly on the assumption of stationarity. However, regression is possible if the two processes are cointegrated. For example, stock prices may be in random walk but there are portfolios that are stationary (if the aggregate variables Y and X are proportional in the long run, then Zt=Yt/Xt would be stationary). The error in this portfolio will have constant mean and standard deviation, and thus should not wander too far from its average. This (equilibrium) error correction term, or residuals, denoted zt or ut, should be close to zero (stationary). Lags of zt can be included to account for serial correlations.
Wooldwridge (2012, pp. 646-648) reminds us that the strict definition of cointegration requires Yt-βXt to be I(0) without a trend (the mean is not reversing or constantly zero and the fitted line will not be perfectly horizontal). He also says we must get a t statistic much larger in magnitude to find cointegration than if we used the usual DF critical values. This happens because OLS, which minimizes the sum of squared residuals, tends to produce residuals that look like an I(0) sequence even if Yt and Xt are not cointegrated.
Best (2008) presents three types of ECM. The first is the basic form of the ECM, expressed as ΔYt=α+βΔXt-1-βECt-1+ϵt. The second is the Engle and Granger two-step ECM which can be expressed as ΔYt=β0ΔXt-1-β1Zt-1 where Zt=Yt-βXt-α. Z is the cointegrating vector and is obtained by regressing Yt on Xt and taking the residuals, and entering the lagged residuals (i.e., Z) into a regression of ΔYt on ΔXt-1. The third type is the Single Equation ECM expressed as ΔYt=α+β0ΔXt-β1(Yt-1-β2Xt-1)+βϵt. The portion of the equation in parentheses is the error correction mechanism. (Yt-1-β2Xt-1)=0 when Y and X are in their equilibrium state. β0 estimates the short term effect of an increase in X on Y. β1 estimates the speed of return to equilibrium after a deviation. If the ECM approach is appropriate, then -1 < β1 < 0. β2 estimates the long term effect that a one unit increase in X has on Y. This long term effect will be distributed over future time periods according to the rate of error correction – β1. Generally, what happens in ECM is that X causes deviation from the equilibrium, causing Y to be too low, while Y increases to correct for this disequilibrium.
If the variables are not cointegrated, we can still run a regression involving the differenced variables, including lags. But such regressions explain the difference in Y only in terms of the difference in X and they have nothing necessarily to do with a relationship in levels. In contrast, ECM allows inferences in both levels and first-differences in variables.
References.
Best, R. (2008). An Introduction to Error Correction Models.
Murray, M. P. (1994). A drunk and her dog: an illustration of cointegration and error correction. The American Statistician, 48(1), 37-39.
Smith, A. D., & Harrison, R. (1995). A drunk, her dog and a boyfriend: an illustration of multiple cointegration and error correction. Department of Economics and Operations Research, University of Canterbury.
Wooldridge, J. (2012). Introductory econometrics: A modern approach. Cengage Learning.