Hypothesis testing to confirm the effect of changes in a time series situation
Update 2021/11/20:
- A/B testing is always the best thing to do
- the next best thing is to do a counterfactual estimation (so train a model on before X date, do a prediction and then compare the ‘real’ data to the prediction)
- another thing that can be done is to calculate power statistical power of the t-test to get an idea of what the likelihood of correctly rejecting the null hypothesis is (as the significance level is just the probability of wrongly rejecting the null)
The risk / weakness of doing the 2 sample t-test is that the ‘changed’ data (after the X date) is included in the trend calculation. So that’s why the counterfactual is a better approach here. However, doing prediction is also tricky because you can have a lot of different results (based on the model you used, the time series, how flexible your model is, etc..). Hence, the best stays A/B testing
Notes from a current project, process taken from this blog post
Background example
We have the weekly sales of a product A for the last 3 years. Starting from January 2021, the company starts increase the product price by X%. If you don’t have access to A/B testing, how can you show that this price increase didn’t lead to any negative outcome?
Solution
Do a two sample t-test on the data before and after the price rate increased was introduced. The end goal is to know whether average of each period is significantly different with the other group.
Overview process
1. Remove trend and seasonality from the data in order to obtain residuals
2. Check if the residuals have a normal distribution. If yes, we can move forward
3. Hypothesis testing and two sample T-test
1. Time Series decomposition to extract residuals
First, we decompose our time series to extract the residuals. One way to do this is to use STL. It is a versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend decomposition using Loess” (see https://otexts.com/fpp2/stl.html)
The 2 main parameters are trend-cycle window and the seasonal window. These control how rapidly the trend-cycle and seasonal components can change
We can use the R function stl() in order to find the best model. We can also use mstl() to get some automated decomposition
stl_ts <- forecast::mstl(your_ts)
stl_ts <- stl(your_ts, t.window=21, s.window=13, robust=TRUE)
plot(stl_ts )
2. Check if residuals follow the normal distribution
The statistical t-test assumes that the populations from which the samples are selected have an approximately normal distribution so it has to be checked. There are several ways to do this, for example plotting the distribution, using QQ plot or doing statistical tests like Kolmogorov-Smirnov
3. Hypothesis testing and t-test
Hypothesis testing
- Null hypothesis: there is no difference between the weekly sales before and after the price increase was introduced
- Alternative hypothesis: weekly sales before the price increased are different from after
Two sample t-test on the residuals
“The two sample t-test compares the means of 2 samples. The purpose of this test is to determine whether the means of the populations from which the samples were drawn are the same” (see ‘statistic in a nutshell’, page 160). So we compare 2 periods, the one before the change occurred and the one after. Now the question is for the “before” period, how far back should you go? I guess it depends of the project you work on and how your time series looks like. If you have seasonality it can be interesting to compare the after period (let’s say 2021 Jan-Apr) to the same period over the previous years (2020 Jan-Apr, 2019 Jan-Apr…)
t.test(before_period, after_period, alternative = “two.sided”, var.equal = FALSE)
Result
If p-value > 0.05 , then we fail to reject the null hypothesis. Therefore, we conclude that we have no evidence that there is any difference in weekly sales before and after the product price was increased
Notes
- Residuals might have auto correlation with each other which violates independence assumption of the t-test
- Another potential technique is the causal impact approach: “on this approach, the original time series and another control time series are used to construct a model. The model will predict observation of the “alternate universe”; we can measure the impact of the actual decision by subtracting actual observation with the prediction”
Sources
- The process concept: https://elvyna.github.io/2018/time-series-hypothesis-testing/
- Related notebook: https://nbviewer.org/github/elvyna/data-analysis/blob/master/code/python/2018-09-16%20Bukalapak%20Nego%20Cincai%20-%20Time%20series%20hypothesis%20test.ipynb
- R time series: https://buildmedia.readthedocs.org/media/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-r-for-time-series.pdf
- Excellent source for time series analysis: https://otexts.com/fpp2/stl.html
- Statistics in a nutshell (O’Reilly)