Generalized Linear Models (GLM) - Complete Explanation

Introduction to GLM

In Generalized Linear Models, we extend the traditional linear model by relaxing the assumption that the error term must be normally distributed. GLM generalizes linear regression under two key conditions:

Distribution: \(Y|X\) follows some exponential family distribution
Link Function: A link function maps the conditional expectation to the linear predictor: \(g(\mu) = X^T\beta\)

This framework allows us to model binary outcomes (logistic regression), count data (Poisson regression), and many other types of responses.

Canonical Exponential Family

A subset of exponential family distributions is called the Canonical exponential family, with probability density function:

f_{\theta}(y) = \exp\left(\frac{y\theta - b(\theta)}{\phi} + c(y, \phi)\right)

where \(b(\theta)\) is the cumulant generating function. This formulation gives us important relationships:

b'(\theta) = \mu\] \[\mu = g^{-1}(X^T\beta)\] \[\theta = b'^{-1}(g^{-1}(X^T\beta)) = h(X^T\beta)

The log-likelihood function becomes:

l_n(\beta, \phi;Y,X) = \sum_i \frac{Y_i h(X_i^T\beta) - b(h(X_i^T\beta))}{\phi}

Notice that \(b\) is a convex function (\(b'' > 0\)), which makes the log-likelihood concave (linear minus convex = concave). This guarantees a unique global maximum.

Optimization Methods for GLM

Several iterative methods can be used to find the maximum likelihood estimates:

Newton-Raphson

\beta^{(k+1)} = \beta^{(k)} - H_{l_n}(\beta^{(k)})^{-1}\nabla l_n(\beta^{(k)})

Uses quadratic approximation. Computing the inverse Hessian is expensive.

Fisher Scoring

E_{\theta}[H_{l_n}(\theta)] = -I(\theta)\] \[\beta^{(k+1)} = \beta^{(k)} + I(\beta^{(k)})^{-1}\nabla l_n(\beta^{(k)})

Replaces Hessian with its expectation (Fisher information). For canonical exponential family, Hessian equals Fisher information.

IRLS (Iteratively Reweighted Least Squares)

\beta^{(k+1)} = (X^TWX)^{-1}X^TW(\tilde{Y} - \tilde{\mu} + X\beta^{k})

Where \(W_i = \frac{h'(X^T\beta)}{g'(\mu)\phi}\). Transforms the problem into weighted least squares at each step.

Demonstration: Logistic Regression with Logit Link

Logistic regression is a special case of GLM where:

The response \(Y\) follows a Bernoulli distribution
The link function is the logit function: \(g(\mu) = \log\left(\frac{\mu}{1-\mu}\right) = X^T\beta\)
This gives us the familiar sigmoid function: \(\mu = \frac{1}{1 + e^{-X^T\beta}}\)

IRLS for Logistic Regression

For logistic regression, the IRLS updates have a particularly elegant form:

W_i = p_i(1-p_i)\] \[z_i = \eta_i + \frac{Y_i - p_i}{p_i(1-p_i)}\] \[\beta^{(k+1)} = (X^T W X)^{-1} X^T W z

Where \(p_i = \frac{1}{1 + e^{-\eta_i}}\) and \(\eta_i = X_i^T\beta^{(k)}\).