Newton-Raphson Method - Complete Explanation

Introduction

The Newton-Raphson method belongs to the family of iterative optimization algorithms. It can be used to find roots of equations or to optimize functions by finding points where the derivative equals zero.

Root Finding Formula:

x_{k+1} = x_k - \frac{g(x)}{g'(x)}

This finds \(x\) such that \(g(x) = 0\).

Optimization Formula:

x_{k+1} = x_k - H(x_k)^{-1}\nabla f(x_k)

This finds \(x\) such that \(f'(x) = 0\) (critical points).

But how is this derived? What is the intuition behind the algorithm? Let's explore with an example.

Visualizing the Example Function

Consider the cubic function:

\[f(x) = x^3 + 3x^2 + 100\]

Function Plot

We start with an initial point \(x_k = 7.5\). The idea is to iteratively move our point closer to the critical point where \(f'(x) = 0\).

A simple approach would be gradient descent with a fixed learning rate:

x_{k+1} = x_k - \alpha \nabla f(x_k)

But can we do better by incorporating curvature information?

Taylor Series Approximation

Instead of just first-order, we use second-order Taylor expansion to include curvature:

f(x) \approx f(x_k) + f'(x_k)(x - x_k) + \frac{1}{2}f''(x_k)(x - x_k)^2

First-order vs Second-order Taylor Approximation at x = 7.5

Notice that the second-order approximation captures the curvature and has its own critical point that we can solve for analytically.

Derivation of Newton-Raphson

To find the critical point of the second-order approximation, we take the derivative and set it to zero:

\frac{d}{dx}\left[f(x_k) + f'(x_k)(x - x_k) + \frac{1}{2}f''(x_k)(x - x_k)^2\right] = 0

This gives us:

\[0 = f'(x_k) + f''(x_k)(x - x_k)\]

Solving for \(x\):

x = x_k - \frac{f'(x_k)}{f''(x_k)}

This is exactly the Newton-Raphson update formula! The method uses both the gradient \(f'(x_k)\) and the curvature \(f''(x_k)\) to determine both direction and optimal step size.

Newton Step Visualization

Python Implementation

Here's a complete implementation of the Newton-Raphson method with verbose output and convergence tracking:

import numpy as np
import matplotlib.pyplot as plt
from typing import Callable, Tuple, Optional, List

def newton_raphson(
    f: Callable[[float], float],
    f_prime: Callable[[float], float],
    x0: float,
    tol: float = 1e-10,
    max_iter: int = 100,
    verbose: bool = False
) -> Tuple[float, int, List[float]]:
    """
    Newton-Raphson method for finding roots.
    
    Parameters:
    -----------
    f : function
        The function whose root we're finding
    f_prime : function
        Derivative of f
    x0 : float
        Initial guess
    tol : float
        Tolerance for stopping (|f(x)| < tol)
    max_iter : int
        Maximum iterations
    verbose : bool
        Print iterations if True
    
    Returns:
    --------
    x : float
        Approximated root
    iterations : int
        Number of iterations used
    history : list
        History of x values
    """
    x = x0
    history = [x]
    
    for i in range(max_iter):
        fx = f(x)
        fpx = f_prime(x)
        
        if abs(fpx) < 1e-15:
            raise ValueError(f"Derivative too small at x = {x}")
        
        # Newton step
        x_new = x - fx / fpx
        history.append(x_new)
        
        if verbose:
            print(f"Iter {i}: x = {x_new:.10f}, f(x) = {f(x_new):.10e}")
        
        # Check convergence
        if abs(f(x_new)) < tol:
            return x_new, i + 1, history
        
        x = x_new
    
    print(f"Warning: Maximum iterations ({max_iter}) reached")
    return x, max_iter, history

Example 1: Root Finding

Find the root of \(f(x) = x^2 - 4\) (solutions are \(x = \pm 2\)):

f(x) = x² - 4 and Newton Iterations from x₀ = 3.0

def f1(x):
    return x**2 - 4

def f1_prime(x):
    return 2*x

root, iterations, history = newton_raphson(f1, f1_prime, x0=3.0, verbose=True)
print(f"\nRoot found: {root:.10f}")
print(f"Iterations: {iterations}")
print(f"Final error: {abs(f1(root)):.2e}")

Iteration	x	f(x)	Step Size
0	3.0000000000	5.00e+00	-
1	2.1666666667	6.94e-01	0.8333
2	2.0064102564	2.57e-02	0.1603
3	2.0000102400	4.10e-05	0.0064
4	2.0000000000	1.05e-10	0.00001
5	2.0000000000	0.00e+00	0.00000

✓ Root found: 2.0000000000 in just 5 iterations with quadratic convergence.

The step sizes decrease quadratically as we approach the root.

Example 2: Optimization (Finding Critical Points)

For the original function \(f(x) = x^3 + 3x^2 + 100\), find where \(f'(x) = 0\):

f'(x) = 3x^2 + 6x\] \[f''(x) = 6x + 6

f'(x) and Newton Iterations from x₀ = 3.0

# For optimization, we find roots of f'(x)
def f1(x):  # This is f'(x)
    return 3*x**2 + 6*x

def f1_prime(x):  # This is f''(x)
    return 6*x + 6

root, iterations, history = newton_raphson(f1, f1_prime, x0=3.0, verbose=True)
print(f"\nRoot found: {root:.10f}")
print(f"Iterations: {iterations}")
print(f"Final error: {abs(f1(root)):.2e}")

Iteration	x	f'(x)	f''(x)	Step Size (f'/f'')
0	3.0000000000	4.50e+01	2.40e+01	1.8750
1	1.1250000000	1.05e+01	1.28e+01	0.8272
2	0.2977941176	2.05e+00	7.79e+00	0.2636
3	0.0341661806	2.08e-01	6.20e+00	0.0336
4	0.0005643812	3.39e-03	6.00e+00	0.0006
5	0.0000001592	9.55e-07	6.00e+00	1.59e-07
6	0.0000000000	7.60e-14	6.00e+00	1.27e-14

✓ Critical point found at x = 0 (where f'(x) = 0) in just 6 iterations.

Notice how the step size (f'/f'') becomes very small as we approach the root, showing the quadratic convergence.

Note that \(f'(x) = 3x^2 + 6x = 3x(x + 2)\) has roots at \(x = 0\) and \(x = -2\). The algorithm found the root at \(x = 0\) because we started at \(x_0 = 3.0\).

Key Insights

Quadratic Convergence

Newton-Raphson exhibits quadratic convergence near the root, meaning the number of correct digits roughly doubles each iteration.

Uses Curvature Information

By incorporating the second derivative (Hessian), the method adapts the step size based on the function's local curvature.

Initial Guess Matters

The method can converge to different roots depending on the starting point, as seen in the optimization example.

Generalization to Multiple Dimensions

The formula generalizes to \(x_{k+1} = x_k - H(x_k)^{-1}\nabla f(x_k)\) where \(H\) is the Hessian matrix.