Newton-Raphson Method
An iterative optimization algorithm for finding roots of functions and solving optimization problems. Derived from Taylor series expansion, it uses both gradient and curvature information for rapid convergence.
Introduction
The Newton-Raphson method belongs to the family of iterative optimization algorithms. It can be used to find roots of equations or to optimize functions by finding points where the derivative equals zero.
This finds \(x\) such that \(g(x) = 0\).
This finds \(x\) such that \(f'(x) = 0\) (critical points).
But how is this derived? What is the intuition behind the algorithm? Let's explore with an example.
Visualizing the Example Function
Consider the cubic function:
We start with an initial point \(x_k = 7.5\). The idea is to iteratively move our point closer to the critical point where \(f'(x) = 0\).
A simple approach would be gradient descent with a fixed learning rate:
But can we do better by incorporating curvature information?
Taylor Series Approximation
Instead of just first-order, we use second-order Taylor expansion to include curvature:
Notice that the second-order approximation captures the curvature and has its own critical point that we can solve for analytically.
Derivation of Newton-Raphson
To find the critical point of the second-order approximation, we take the derivative and set it to zero:
This gives us:
Solving for \(x\):
This is exactly the Newton-Raphson update formula! The method uses both the gradient \(f'(x_k)\) and the curvature \(f''(x_k)\) to determine both direction and optimal step size.
Python Implementation
Here's a complete implementation of the Newton-Raphson method with verbose output and convergence tracking:
import numpy as np
import matplotlib.pyplot as plt
from typing import Callable, Tuple, Optional, List
def newton_raphson(
f: Callable[[float], float],
f_prime: Callable[[float], float],
x0: float,
tol: float = 1e-10,
max_iter: int = 100,
verbose: bool = False
) -> Tuple[float, int, List[float]]:
"""
Newton-Raphson method for finding roots.
Parameters:
-----------
f : function
The function whose root we're finding
f_prime : function
Derivative of f
x0 : float
Initial guess
tol : float
Tolerance for stopping (|f(x)| < tol)
max_iter : int
Maximum iterations
verbose : bool
Print iterations if True
Returns:
--------
x : float
Approximated root
iterations : int
Number of iterations used
history : list
History of x values
"""
x = x0
history = [x]
for i in range(max_iter):
fx = f(x)
fpx = f_prime(x)
if abs(fpx) < 1e-15:
raise ValueError(f"Derivative too small at x = {x}")
# Newton step
x_new = x - fx / fpx
history.append(x_new)
if verbose:
print(f"Iter {i}: x = {x_new:.10f}, f(x) = {f(x_new):.10e}")
# Check convergence
if abs(f(x_new)) < tol:
return x_new, i + 1, history
x = x_new
print(f"Warning: Maximum iterations ({max_iter}) reached")
return x, max_iter, history
Example 1: Root Finding
Find the root of \(f(x) = x^2 - 4\) (solutions are \(x = \pm 2\)):
def f1(x):
return x**2 - 4
def f1_prime(x):
return 2*x
root, iterations, history = newton_raphson(f1, f1_prime, x0=3.0, verbose=True)
print(f"\nRoot found: {root:.10f}")
print(f"Iterations: {iterations}")
print(f"Final error: {abs(f1(root)):.2e}")
| Iteration | x | f(x) | Step Size |
|---|---|---|---|
| 0 | 3.0000000000 | 5.00e+00 | - |
| 1 | 2.1666666667 | 6.94e-01 | 0.8333 |
| 2 | 2.0064102564 | 2.57e-02 | 0.1603 |
| 3 | 2.0000102400 | 4.10e-05 | 0.0064 |
| 4 | 2.0000000000 | 1.05e-10 | 0.00001 |
| 5 | 2.0000000000 | 0.00e+00 | 0.00000 |
✓ Root found: 2.0000000000 in just 5 iterations with quadratic convergence.
The step sizes decrease quadratically as we approach the root.
Example 2: Optimization (Finding Critical Points)
For the original function \(f(x) = x^3 + 3x^2 + 100\), find where \(f'(x) = 0\):
# For optimization, we find roots of f'(x)
def f1(x): # This is f'(x)
return 3*x**2 + 6*x
def f1_prime(x): # This is f''(x)
return 6*x + 6
root, iterations, history = newton_raphson(f1, f1_prime, x0=3.0, verbose=True)
print(f"\nRoot found: {root:.10f}")
print(f"Iterations: {iterations}")
print(f"Final error: {abs(f1(root)):.2e}")
| Iteration | x | f'(x) | f''(x) | Step Size (f'/f'') |
|---|---|---|---|---|
| 0 | 3.0000000000 | 4.50e+01 | 2.40e+01 | 1.8750 |
| 1 | 1.1250000000 | 1.05e+01 | 1.28e+01 | 0.8272 |
| 2 | 0.2977941176 | 2.05e+00 | 7.79e+00 | 0.2636 |
| 3 | 0.0341661806 | 2.08e-01 | 6.20e+00 | 0.0336 |
| 4 | 0.0005643812 | 3.39e-03 | 6.00e+00 | 0.0006 |
| 5 | 0.0000001592 | 9.55e-07 | 6.00e+00 | 1.59e-07 |
| 6 | 0.0000000000 | 7.60e-14 | 6.00e+00 | 1.27e-14 |
✓ Critical point found at x = 0 (where f'(x) = 0) in just 6 iterations.
Notice how the step size (f'/f'') becomes very small as we approach the root, showing the quadratic convergence.
Note that \(f'(x) = 3x^2 + 6x = 3x(x + 2)\) has roots at \(x = 0\) and \(x = -2\). The algorithm found the root at \(x = 0\) because we started at \(x_0 = 3.0\).
Key Insights
Quadratic Convergence
Newton-Raphson exhibits quadratic convergence near the root, meaning the number of correct digits roughly doubles each iteration.
Uses Curvature Information
By incorporating the second derivative (Hessian), the method adapts the step size based on the function's local curvature.
Initial Guess Matters
The method can converge to different roots depending on the starting point, as seen in the optimization example.
Generalization to Multiple Dimensions
The formula generalizes to \(x_{k+1} = x_k - H(x_k)^{-1}\nabla f(x_k)\) where \(H\) is the Hessian matrix.