Hypothesis testing to confirm the effect of changes in a time series situation

4 min readNov 8, 2021

Update 2021/11/20:

A/B testing is always the best thing to do
the next best thing is to do a counterfactual estimation (so train a model on before X date, do a prediction and then compare the ‘real’ data to the prediction)
another thing that can be done is to calculate power statistical power of the t-test to get an idea of what the likelihood of correctly rejecting the null hypothesis is (as the significance level is just the probability of wrongly rejecting the null)

The risk / weakness of doing the 2 sample t-test is that the ‘changed’ data (after the X date) is included in the trend calculation. So that’s why the counterfactual is a better approach here. However, doing prediction is also tricky because you can have a lot of different results (based on the model you used, the time series, how flexible your model is, etc..). Hence, the best stays A/B testing

Notes from a current project, process taken from this blog post

Background example

We have the weekly sales of a product A for the last 3 years. Starting from January 2021, the company starts increase the product price by X%. If you don’t have access to A/B testing, how can you show that this price increase didn’t lead to any negative outcome?

Solution

Do a two sample t-test on the data before and after the price rate increased was introduced. The end goal is to know whether average of each period is significantly different with the other group.

Overview process

1. Remove trend and seasonality from the data in order to obtain residuals

2. Check if the residuals have a normal distribution. If yes, we can move forward

3. Hypothesis testing and two sample T-test

1. Time Series decomposition to extract residuals

First, we decompose our time series to extract the residuals. One way to do this is to use STL. It is a versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend decomposition using Loess” (see https://otexts.com/fpp2/stl.html)

The 2 main parameters are trend-cycle window and the seasonal window. These control how rapidly the trend-cycle and seasonal components can change

We can use the R function stl() in order to find the best model. We can also use mstl() to get some automated decomposition

stl_ts <- forecast::mstl(your_ts)
stl_ts <- stl(your_ts, t.window=21, s.window=13, robust=TRUE)
plot(stl_ts )

2. Check if residuals follow the normal distribution

The statistical t-test assumes that the populations from which the samples are selected have an approximately normal distribution so it has to be checked. There are several ways to do this, for example plotting the distribution, using QQ plot or doing statistical tests like Kolmogorov-Smirnov

3. Hypothesis testing and t-test

Hypothesis testing

Null hypothesis: there is no difference between the weekly sales before and after the price increase was introduced
Alternative hypothesis: weekly sales before the price increased are different from after

Two sample t-test on the residuals

“The two sample t-test compares the means of 2 samples. The purpose of this test is to determine whether the means of the populations from which the samples were drawn are the same” (see ‘statistic in a nutshell’, page 160). So we compare 2 periods, the one before the change occurred and the one after. Now the question is for the “before” period, how far back should you go? I guess it depends of the project you work on and how your time series looks like. If you have seasonality it can be interesting to compare the after period (let’s say 2021 Jan-Apr) to the same period over the previous years (2020 Jan-Apr, 2019 Jan-Apr…)

t.test(before_period, after_period, alternative = “two.sided”, var.equal = FALSE)

Result

If p-value > 0.05 , then we fail to reject the null hypothesis. Therefore, we conclude that we have no evidence that there is any difference in weekly sales before and after the product price was increased

Notes

Residuals might have auto correlation with each other which violates independence assumption of the t-test
Another potential technique is the causal impact approach: “on this approach, the original time series and another control time series are used to construct a model. The model will predict observation of the “alternate universe”; we can measure the impact of the actual decision by subtracting actual observation with the prediction”

Sources

The process concept: https://elvyna.github.io/2018/time-series-hypothesis-testing/
Related notebook: https://nbviewer.org/github/elvyna/data-analysis/blob/master/code/python/2018-09-16%20Bukalapak%20Nego%20Cincai%20-%20Time%20series%20hypothesis%20test.ipynb
R time series: https://buildmedia.readthedocs.org/media/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-r-for-time-series.pdf
Excellent source for time series analysis: https://otexts.com/fpp2/stl.html
Statistics in a nutshell (O’Reilly)