Quantile Regression - Explanation & Visualization

Introduction

In traditional linear regression, we attempt to model the conditional mean by assuming that errors are normally distributed with zero mean and constant variance.

\epsilon \sim N(0, \sigma^2)

Unfortunately, real-world data are usually not normally distributed. Often, data are heteroskedastic, have many outliers, or follow special distributions. In these cases, linear regression may not be the best model to use.

Quantile regression is a useful alternative when the data are not normally distributed. It models the conditional median (or any other quantile) of the response variable instead of the conditional mean—it models the conditional distribution instead of just the conditional mean.

The Pinball Loss Function

The difference between quantile regression and linear regression lies solely in the loss function. In quantile regression, a pinball loss function is used, defined as:

\rho_\tau(u) = u(\tau - I(u<0))

or equivalently:

\rho_\tau(u) = \begin{cases} \tau u & \text{if } u \ge 0 \\ (\tau - 1) u & \text{if } u < 0 \end{cases}

Interactive Visualization

Drag the slider to change the quantile ($\tau$) and see how the loss function changes shape. Notice how $\tau=0.5$ yields the symmetric absolute error (median regression), while extreme values penalize over/under-predictions asymmetrically.

Quantile ($\tau$): 0.5

Intuition: If we want to model the 0.25 quantile (a lower quantile), the loss function weights points lying below the line less heavily than points above it. Optimization balances these weights so that approximately 25% of points fall below the line.

Examples: When to use Quantile Regression

1. Heteroskedasticity

Notice how quantile regression accurately captures the heteroskedasticity by estimating conditional quantiles separately. The spread of the quantiles increases as X increases.

# Generate data with heteroskedasticity
np.random.seed(42)
n = 1000
X = np.random.uniform(0, 10, n)
y = 2 * X + X * np.random.normal(0, 0.5, n)  # Variance increases with X

2. Skewed Distribution (Log-Normal)

Notice how quantile regression can accurately capture the skewness by estimating individual conditional quantiles, unlike the conditional mean which enforces symmetric boundaries.

# Generate skewed errors from log-normal distribution
skewed_errors = np.random.lognormal(mean=0, sigma=0.8, size=n)
skewed_errors = skewed_errors - np.mean(skewed_errors) # Center errors
y = 2 * X + skewed_errors

3. Heavy-Tailed Distribution (Cauchy)

The Cauchy distribution has undefined mean and variance. Linear regression (Least Squares) crumbles under the extreme outliers, but Quantile Regression (robust to outliers) prevails.

# Generate errors from Cauchy distribution
cauchy_errors = np.random.standard_cauchy(size=n) * 2
y = 2 * X + cauchy_errors

Prediction Interval vs Confidence Interval

We compare the prediction interval derived from linear regression with the quantile estimated with a quantile regression model. Quantile regression provides a more robust estimate of the conditional distribution.

Note that confidence intervals (CI) estimate the range for the mean response, whereas prediction intervals (PI) predict the range for a single, new observation.

Formula for prediction interval:

x_0^T\hat{\beta} \pm t_{n-p, \alpha /2}\hat{\sigma}\sqrt{1+x_0^T(X^TX)^{-1}x_0}

Formula for confidence interval:

x_0^T\hat{\beta} \pm t_{n-p, \alpha /2}\hat{\sigma}\sqrt{x_0^T(X^TX)^{-1}x_0}

The difference is the "+1" inside the square root. The prediction interval is always wider, as it accounts for the variance of the error term $\epsilon$ in addition to the uncertainty in the coefficient estimates.

Derivation of the Quantile Loss Function

We want to find a prediction $\hat{y}$ that minimizes the expected loss for a given quantile $\tau \in (0,1)$.

Step 1: Define asymmetric absolute loss

L_\tau(y, \hat{y}) = \begin{cases} c_1 |y - \hat{y}| & \text{if } y \ge \hat{y} \\ c_2 |y - \hat{y}| & \text{if } y < \hat{y} \end{cases}

Step 2: Express expected loss

E[L_\tau(Y, \hat{y})] = c_1 \int_{\hat{y}}^{\infty} (y - \hat{y}) f_Y(y) \, dy + c_2 \int_{-\infty}^{\hat{y}} (\hat{y} - y) f_Y(y) \, dy

Step 3: Take derivative with respect to $\hat{y}$

Using Leibniz's rule for differentiation under the integral sign:

\frac{\partial}{\partial \hat{y}} E[L_\tau] = -c_1 \int_{\hat{y}}^{\infty} f_Y(y) \, dy + c_2 \int_{-\infty}^{\hat{y}} f_Y(y) \, dy

This simplifies to:

\frac{\partial}{\partial \hat{y}} E[L_\tau] = -c_1 [1 - F_Y(\hat{y})] + c_2 F_Y(\hat{y})

Step 4: Set derivative to zero for optimal prediction

-c_1 [1 - F_Y(\hat{y})] + c_2 F_Y(\hat{y}) = 0 $$ $$ c_2 F_Y(\hat{y}) = c_1 [1 - F_Y(\hat{y})] $$ $$ c_2 F_Y(\hat{y}) = c_1 - c_1 F_Y(\hat{y}) $$ $$ c_2 F_Y(\hat{y}) + c_1 F_Y(\hat{y}) = c_1 $$ $$ F_Y(\hat{y})(c_1 + c_2) = c_1 $$ $$ F_Y(\hat{y}) = \frac{c_1}{c_1 + c_2}

Step 5: Set this equal to the desired quantile

\tau = \frac{c_1}{c_1 + c_2}

Step 6: Choose convenient constants

A convenient normalization is to set $c_1 + c_2 = 1$, which gives:

c_1 = \tau \quad \text{and} \quad c_2 = 1 - \tau

Step 7: The resulting loss function

Substituting these constants back into our original loss function:

\rho_\tau(u) = \begin{cases} \tau |u| & \text{if } u \ge 0 \\ (1-\tau) |u| & \text{if } u < 0 \end{cases}

Implementation in Python

import statsmodels.api as sm
from statsmodels.regression.quantile_regression import QuantReg

# Add constant to predictor variables
X_with_const = sm.add_constant(X)

# Fit quantile regression for multiple quantiles
quantiles = [0.1, 0.25, 0.5, 0.75, 0.9]

for tau in quantiles:
    # Fit the model
    model = QuantReg(y, X_with_const)
    result = model.fit(q=tau)

    # Make predictions
    predictions = result.predict(X_with_const)
    
    # Print summary
    print(result.summary())