Quantile Regression Notion

Marc Deveaux

3 min readJul 10, 2022

Source

What is not Quantile Regression

Quantile regression is not a regression estimated on a quantile!

What is Quantile Regression

Quantile regression provides an alternative to ordinary least squares (OLS) regression and related methods, which typically assume that associations between independent and dependent variables are the same at all levels
Quantile regression allows the analyst to drop the assumption that variables operate the same at the upper tails of the distribution as at the mean and to identify the factors that are important determinants of variables.

Advantages of Quantile Regression

It allows understanding relationships between variables outside of the mean of the data, making it useful in understanding outcomes that are non-normally distributed and that have nonlinear relationships with predictor variables
QR makes no assumption on the target distribution, so is more robust to mis-specification of error distribution.
QR is not sensitive to outliers. OLS estimates the conditional mean so is sensitive to outliers.
It can be used to create prediction interval

When to use Quantile Regression

When you are interested in the prediction intervals
When you have outliers
Residuals are not normal (when predicting income for example)
When the distribution of the target variable is heteroscedastic (meaning the variance of y-values changes with increasing values of x) because QR does not make assumptions on the distribution of the target variable

Difference between OLS and QR

In OLS regression, the goal is to minimize the distances between the values predicted by the regression line and the observed values.

In contrast, quantile regression differentially weights the distances between the values predicted by the regression line and the observed values, then tries to minimize the weighted distances

What do we mean by weight the distance? Let’ say we want to do QR on the 75th percentile.

bigger weight (1.75) are given to data points located above the 75th percentile
lower weight (0.5) are given to data points located below the 75th percentile, so we are penalizing those data points
doing so ensures that minimization occurs when 75 percent of the residuals are negative.

On the opposite, if you calculate QR for 5th percentile, you penalized the data points located above the QR line for 5th percentile and put a bigger weight on the data points located below. At the end you have 5% of your data points located under the QR line.

Prediction Interval

Let’s say you are in financial risk management and do QR on 0.1, 0.5 and 0.9 quantiles, prediction intervals allow you to do the following comment: “we have 80% confidence that the financial loss will be between 10M and 70M, with the average around 40M”

You can then check the feature importance or the coefficient for the different QR models. In the case of income prediction, main features for the top 10% should be different than for the bottom 10% for example

Some useful python code for LGBM quantile regression

reg = LGBMRegressor(random_state=0,
 alpha=0.5, # median
 min_data_in_leaf=700, 
 num_leaves=80,
 max_depth=9,
 n_estimators=300,
 objective=’quantile’,
 boosting_type=’gbdt’
 )
reg.fit(x_train,y_train)
# make a single prediction
lightgbm_pred = reg.predict(x_test)# plot intervals
result_200 = result.sample(n=150)
result_200 = result_200.sort_values(by=[‘annualincome’], ascending=True)
result_200 = result_200.reset_index(drop=True)def showIntervals(df): 
 plt.plot(df[‘annualincome’],’go’,markersize=3,label=’annualincome’)
 plt.fill_between(
 # pred_01 is 10% percentile QR pred
 np.arange(df.shape[0]), df[‘pred_01’], df[‘pred_09’], alpha=0.5, color=”r”,
 label=”Predicted interval”)
 plt.xlabel(“Ordered samples.”)
 plt.ylabel(“Values and prediction intervals.”)
 #plt.xlim([0, 150])
 #plt.ylim([0, 15_000_000])
 plt.legend()
 plt.show()
 
showIntervals(result_200)

Quantile Regression Notion

Source

Written by Marc Deveaux