How to perform time series decomposition

Time series decomposition lets you separate the trend and seasonality in your data so you can see what's happening beneath. Here's how it's done.

How to perform time series decomposition
Picture by Jatin Anand, Pexels.
10 minutes to read

Time series data have a reputation for being somewhat complicated, partly because they’re made up of a number of different components that work together. At the most basic level these consist of the trend - indicating whether a time series metric is going up or down over time - and the seasonality, which can be yearly, monthly, or daily.

Most time series forecasting models use a technique called time series decomposition to split out these components from the time series, so they can separate the trend and seasonality to identify noise and other changes in the underlying metric being forecast. Ordinarily, this can be quite a complex procedure, but it’s fairly straightforward in the Prophet model.

Prophet includes an automatic time series decomposition feature which allows you to remove the trend and noise from your data to see the underlying seasonality, or remove the noise and seasonality to see the underlying trend. Without seasonal decomposition, these things can be much harder to identify. Here’s how it’s done.

Load the packages

We’ll be using three packages for this project: Pandas for displaying and manipulating our data, GAPandas for fetching Google Analytics data using the Reporting API, and the Prophet forecasting model for time series decomposition. Any packages you don’t have can be installed by entering pip3 install package-name.

import pandas as pd
import gapandas as gp
from fbprophet import Prophet
from fbprophet.plot import plot_plotly
from fbprophet.plot import plot_components_plotly
from fbprophet.plot import add_changepoints_to_plot
from fbprophet.plot import plot_yearly

Configure GAPandas

If you already have a time series dataset you can skip this step. If you want to personal time series decomposition on data from a Google Analytics account you’ll need to set up GAPandas. This is explained in more detail in this guide, but you’ll require your JSON keyfile and the view ID for the account you want to access.

service = gp.get_service('client-secret.json', verbose=False)
view = '123456789'

Query the Google Analytics API

Next, we’ll create a simple API query payload and pass it to Google Analytics using GAPandas. This will return a time series dataframe in which your chosen metric is shown alongside the date. Google Analytics will automatically fill in any blanks with zeros. I’m examining some web analytics data for one of my personal sites.

payload = {
        'start_date': '2016-01-01',
        'end_date': '2021-01-01',
        'metrics': 'ga:entrances',
        'dimensions': 'ga:date'
}

df = gp.run_query(service, view, payload)
df.head()
date entrances
0 2016-01-01 7
1 2016-01-02 2
2 2016-01-03 2
3 2016-01-04 1
4 2016-01-05 6

Reformat your data

The Prophet model requires a dataframe containing two columns: a datetime column called ds and a field containing your metric called y. Use the Pandas rename() function to rename the columns and ensure the date column is set to datetime using the to_datetime() function.

df = df.rename(columns={'date':'ds', 'entrances':'y'})
df['ds'] = pd.to_datetime(df['ds'], format='%Y-%m-%d')
df.head()
ds y
0 2016-01-01 7
1 2016-01-02 2
2 2016-01-03 2
3 2016-01-04 1
4 2016-01-05 6

Fit your model

Next, we’ll configure Prophet to use daily_seasonality, because the traffic on the site I’ve used varies according to the day of the week. Then, we’ll fit the model to our df dataframe containing the ds and y columns.

model = Prophet(daily_seasonality=True)
model.fit(df)
INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.





<fbprophet.forecaster.Prophet at 0x7f102da58490>

Create a future dataframe

In order to give Prophet somewhere to store its predictions for our forecast, we need to extend the dataframe of dates to include those in the future period we want to predict. I’ve done this using the make_future_dataframe() function which I have set to the next 365 days.

future = model.make_future_dataframe(periods=365)
future.tail()
ds
2188 2021-12-28
2189 2021-12-29
2190 2021-12-30
2191 2021-12-31
2192 2022-01-01

Create a forecast

Now we can get the Prophet model to predict y for the next 365 days using the predict() function. By examining the fields in the forecast dataframe, we can see that we get a yhat holding our predicted value, plus a yhat_lower and a yhat_upper, representing the confidence interval on either side.

forecast = model.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

For 2021-12-28 Prophet is forecasting that the site will generate a yhat of 1031 entrances, which will lie between the yhat_lower of 875 and the yhat_upper of 1176.

ds yhat yhat_lower yhat_upper
2188 2021-12-28 1031.545626 875.017722 1176.853593
2189 2021-12-29 1022.972801 866.592976 1174.910263
2190 2021-12-30 1024.170165 874.078189 1175.733781
2191 2021-12-31 1023.099677 876.396980 1176.541808
2192 2022-01-01 1048.272269 906.088285 1203.786461

Plotting the forecast

Next we can plot it on a time series plot using Prophet’s plot() function. The dark blue line represents the prediction from yhat and the pale blue lines represent the yhat_upper and yhat_lower. The black dots represent the actual data.

On my data set there are some outliers, caused by site issues and sudden traffic spikes, as well as a level shift anomaly caused by traffic increasing and then dropping as a result of the pandemic. The period after 2021 represents our future period, so there are no black dots here.

forecast_plot = model.plot(forecast)

png

Decomposing the time series

Finally, now we have the time series in the model and have made our forecast we can perform the time series decomposition step.

Prophet actually makes this really easy and it can be generated simply by calling the plot_components() function and passing it the dataframe containing our dataframe from forecast. This generates four separate plots which have been extracted from the time series.

  • Trend: The trend plot shows the general direction in which the metric is going over time. Entrances to my website are rising steadily year-on-year.
  • Weekly: The weekly plot shows the weekly seasonality for this website. It’s busiest on a Sunday and Monday and is the quietest on Friday, with Saturday being a bit less busy than Sunday.
  • Yearly: The yearly plot shows how the traffic changes over the course of a year. It peaks during the summer months and drops to its lowest level over Christmas, before picking up again in January and rising until spring.
  • Daily: The daily seasonality shows how busy the site is by hour. However, importantly, this is based on the whole site and will include a mixture of visitors from different time zones, so would need to be segmented to improve accuracy.
forecast_components = model.plot_components(forecast)

png

Matt Clarke, Saturday, March 13, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Working with Dates and Times in Python

Learn how to work with dates and times in Python .

Start course for FREE

Comments