When trend and seasonality is present in a time series, instead of decomposing it manually to fit an ARMA model using the Box Jenkins method, another very popular method is to use the seasonal autoregressive integrated moving average (SARIMA) model which is a generalization of an ARMA model. SARIMA models are denoted SARIMA(p,d,q)(P,D,Q)[S], where S refers to the number of periods in each season, d is the degree of differencing (the number of times the data have had past values subtracted), and the uppercase P, D, and Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.

The SARIMA model is a bit complex to write out directly so a backshift operator is needed to describe it. For example SARIMA(1,1,1)(1,1,1)[4] is written as:

The backward shift operator B is a useful notational device when working with time series lags: By(t)=y(t−1)

These are all rule of thumbs, not an exact science for picking the number of each parameters in SARIMA(p,d,q)(P,D,Q)[S]. It is an art in picking good parameters from the ACF/PACF plots. The following rules also apply to ARMA and ARIMA models.

**Identifying the order of differencing:**

d=0 if the series has no visible trend or ACF at all lags is low.

d≥1 if the series has visible trend or positive ACF values out to a high number of lags.

Note: if after applying differencing to the series and the ACF at lag 1 is -0.5 or more negative the series may be overdifferenced.

Note: If you find the best d to be d=1 then the original series has a constant trend. A model with d=2 assumes that the original series has a time-varying trend.

**Identifying the number of AR and MA terms**

p is equal to the first lag where the PACF value is above the significance level.

q is equal to the first lag where the ACF value is above the significance level.

**Identifying the seasonal part of the model:**

S is equal to the ACF lag with the highest value (typically at a high lag).

D=1 if the series has a stable seasonal pattern over time.

D=0 if the series has an unstable seasonal pattern over time.

Rule of thumb: d+D≤2

P≥1 if the ACF is positive at lag S, else P=0.

Q≥1 if the ACF is negative at lag S, else Q=0.

Rule of thumb: P+Q≤2

### Grid search for SARIMA model selection

Doing a full manual time series analysis can be a tedious task, especially when you have many data sets to analyze. It is preferred to then automate the task of model selection with grid search. For SARIMA, since we have many parameters, grid search may take hours to complete on one data set if we set the limit of each parameter too high. Setting the limits too high will also make your model too complex and overfit the training data.

To prevent the long runtime and overfitting problem, we apply what is known as the parsimony principle where we create a combination of all parameters such that p+d+q+P+D+Q≤ 6. Another approach is to set each parameter as 0 or 1 or 2 and do grid search using AIC with each combination. It is more common in forecasting studies to apply grid search on SARIMA when you are using it as a benchmark method to more advanced models such as deep neural networks.

But as a reminder, grid search may not always give you the best model. To get the best model you may need to manually experiment with different parameters using the ACF/PACF plots.

**Python Tutorial**After loading in our time series we plot it, here we use the classical Air Passengers time series.

From inspecting the plot we can conclude that this time series has a positive linear trend, multiplicative seasonal patterns, and possibly some irregular patterns. This information strongly suggests for us to use a SARIMA model to do our forecasting. Let’s get to it! First we split 70% of data for training and 30% fo testing.

Next since the data has multiplicative seasonality we apply a log filter and then analyze the residuals with autocorrelation plots.

We see here that there is no more a multiplicative affect and no more trend. However, an unstable seasonal pattern is still present in this residual series. It indicates that we need to remove the seasonal pattern which can be done with SARIMA. We can select the seasonal pattern parameters of SARIMA by looking at the ACF and PACF plots.

Looking at the ACF and PACF plots of the differenced series we see our first significant value at lag 4 for ACF and at the same lag 4 for the PACF which suggest to use p = 4 and q = 4. We also have a big value at lag 12 in the ACF plot which suggests our season is S = 12 and since this lag is positive it suggests P = 1 and Q = 0. Since this is a differenced series for SARIMA we set d = 1, and since the seasonal pattern is not stable over time we set D = 0. All together this gives us a SARIMA(4,1,4)(1,0,0)[12] model. Next we run SARIMA with these values to fit a model on our training data.

We can see here that the multi-step forecast of our SARIMA(4,1,4)(1,0,0)[12] model fits the testing data extremely well with an RMSE of 23.7! When you manually conduct a good time series analysis, as I have done here, it will be difficult to beat ARMA models for forecasting. I also ran grid search and found the best model to be SARIMA(1, 0, 1)x(1, 1, 1)[12] which had an AIC of 696.05. This resulted in a forecast with an RMSE of 24.74, which is also pretty good! In conclusion depending on your forecasting problem, SARIMA is always a great choice to choose.

http://www.datasciencecentral.com/xn/detail/6448529:BlogPost:712181