Keeping Probabilities Honest: The Jacobian Adjustment

Introduction

customer annoyance from wait times. Calls arrive randomly, so wait time X follows an Exponential distribution—most waits are short, a few are painfully long.

Now I’d argue that annoyance isn’t linear: a 10-minute wait feels more than twice as bad as a 5-minute one. So you decide to model “annoyance units” as $Y = X²$.

Simple, right? Just take the pdf of X, replace x with $\sqrt{y}$, and you’re done.

You plot it. It looks reasonable—peaked near zero, long tail.

But what if you actually computed the CDF? You would expect 1 right?

The result? 2.

Short numpy snippet to confirm this

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import expon

# CDF of Exponential(1): F(x) = 1 - exp(-x) for x >= 0
def cdf_exp(x):
    return 1 - np.exp(-x)

# Wrong (naive) pdf for Y = X²: just substitute x = sqrt(y)
def wrong_pdf(y):
    return np.exp(-np.sqrt(y))  # This integrates to 2!

# Quick numerical check of integral
from scipy.integrate import quad
integral, err = quad(wrong_pdf, 0, np.inf)
print(f"Numerical integral ≈ {integral:.3f} (should be 1, but it's 2)")

# prints 2

Your new distribution claims every possible outcome is twice as likely as it should be.

That’s impossible… but it happened because you missed one small adjustment.

This “adjustment” is the Jacobian—a scaling factor that compensates for how the transformation stretches or compresses the axis at different points. Skip it, and your probabilities lie. Include it, and everything adds up perfectly again.

In this post, we’ll build the intuition, derive the math step by step, see it appear naturally in histogram equalization, visualize the stretching/shrinking empirically, and prove it with simulations.

The Intuition

To grasp why the Jacobian adjustment is necessary, let’s use a tangible analogy: think of a probability distribution as a fixed amount of sand—exactly 1 pound—spread along a number line, where the height of the sand pile at each point represents the probability density. The total sand always adds up to 1, representing 100% probability.

Now, when you transform the random variable (say, from X to Y = X²), it’s like grabbing that number line—a flexible rubber sheet—and warping it according to the transformation. You’re not adding or removing sand; you’re just stretching or compressing different parts of the sheet.

In regions where the transformation compresses the sheet (a long stretch of the original line gets squished into a shorter segment on the new Y-axis), the same amount of sand now occupies less horizontal space. To keep the total sand conserved, the pile must get taller—the density increases. For example, near Y=0 in the squaring transformation, many small X values (from 0 to 1) get crammed into a tiny Y interval (0 to 1), so the density shoots up dramatically.

Conversely, in regions where the transformation stretches the sheet (a short segment of the original line gets pulled into a longer one on the Y-axis), the sand spreads out over more space, making the pile shorter and flatter—the density decreases. For large X (say, from 10 to 11), Y stretches from 100 to 121—a much wider interval—so the density thins out there.

The key point: the total sand remains exactly 1 lb, no matter how you warp the sheet. Without accounting for this local stretching and shrinking, your new density would be inconsistent, like claiming you have 2 lb of sand after the warp. The Jacobian is the mathematical factor that automatically adjusts the height everywhere to preserve the total amount.

The Math

Let’s formalize the intuition with the example of $ Y = g(X) = X^2 $, where $ X $ has pdf $ f_X(x) = e^{-x} $ for $ x \geq 0 $ (Exponential with rate 1).

Consider a small interval around $ x $ with width $ \Delta x $.
The probability in that interval is approximately $ f_X(x) \Delta x $.

After transformation, this maps to an interval around $ y = x^2 $ with width
$ \Delta y \approx \left| g'(x) \right| \Delta x = |2x| \Delta x $.

To conserve probability:
$$ f_Y(y) \Delta y \approx f_X(x) \Delta x, $$
so
$$ f_Y(y) \approx \frac{f_X(x)}{\left| g'(x) \right|} $$

In the limit as $ \Delta x \to 0 $, this becomes exact:
$$ f_Y(y) = f_X(x) \left| \frac{dx}{dy} \right|, $$
where $ x = \sqrt{y} $ (the inverse) and
$ \frac{dx}{dy} = \frac{1}{2\sqrt{y}} $.

Plugging in:
$$ f_Y(y) = e^{-\sqrt{y}} \cdot \frac{1}{2\sqrt{y}} \quad \text{for } y > 0. $$

Without the Jacobian term $ \frac{1}{2\sqrt{y}} $, the naive
$ f_Y(y) = e^{-\sqrt{y}} $
integrates to 2:

Let $ u = \sqrt{y} $, $ y = u^2 $, $ dy = 2u \, du $:
$$ \int_0^\infty e^{-\sqrt{y}} \, dy $$

$$ = \int_0^\infty e^{-u}\cdot 2u \, du $$

$$= 2 \int_0^\infty u e^{-u} \, du $$

$$ = 2 \Gamma(2) = 2 \cdot 1 = 2. $$

The Jacobian adjustment ensures $$ \int_0^\infty f_Y(y) \, dy = 1. $$

A note on $\Gamma$

$\Gamma$ is the representation of factorial for real numbers. $$\Gamma(n) = (n-1)! \quad \text{for positive integers } n$$

This scaling factor $ \left| \frac{dx}{dy} \right| $ is precisely what compensates for the local stretching and shrinking of the axis.

The General Form

Let Y = g(X), where g is a strictly monotonic (increasing or decreasing) differentiable function, and X has pdf $ f_X(x) $.

We want the pdf $ f_Y(y) $ of Y.

Consider a small interval around x with width $ \Delta x $.
The probability in that interval is approximately $ f_X(x) \Delta x $.

After transformation y = g(x), this interval maps to an interval around y with width
$ \Delta y \approx \left| g'(x) \right| \Delta x $.

Going back to the equation we developed previously:
$$ f_Y(y) = f_X(x) \left| \frac{dx}{dy} \right|, $$
where we use the inverse relation $x = h(y) = g^{-1}(y) $, and
$ \frac{dx}{dy} = h'(y) = \frac{1}{g'(x)} $.

Thus the general formula is
$$ f_Y(y) = f_X(h(y)) \left| h'(y) \right|. $$

Emperical Proof

Simulating the stretching and shrinking

The best way to “feel” the stretching and shrinking is to zoom in on two regions separately: near zero (where compression happens) and farther out (where stretching dominates).

We’ll generate four plots:

1. Original X histogram, zoomed on small values (X

2. Corresponding Y = X² histogram, zoomed near zero — showing how those tiny X intervals get even tinier on Y (shrink).

3. Original X histogram for larger values (X > 1), with equal intervals of width 1 — to show the source of stretching.

4. Corresponding Y histogram for large values — showing how those X intervals explode into huge Y intervals (stretch).

Code

import numpy as np
import matplotlib.pyplot as plt

# Generate large sample for clear visuals
n = 50000
x = np.random.exponential(scale=1, size=n)
y = x**2

fig = plt.figure(figsize=(16, 10))

def plot_histogram(ax, data, bins, density, color, alpha, title, xlabel, ylabel):
    ax.hist(data, bins=bins, density=density, color=color, alpha=alpha)
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)

# Plot 1: X small values (compression source)
ax1 = fig.add_subplot(2, 2, 1)
plot_histogram(ax1, x[x  1) & (x  1)', xlabel='X', ylabel='Density')

# Equal-width intervals of 1 on larger X
large_x_starts = [1, 3, 5, 7, 9, 11]
large_x_lines = large_x_starts + [s + 1 for s in large_x_starts]
for line in large_x_lines:
    if line  1) & (y

The best way to "feel" the stretching and shrinking is to zoom in on two regions separately: near zero (where compression happens) and farther out (where stretching dominates). — Depending on the distribution, different regions are scaled and shrunk in the transformation.

Simulating the Jacobian adjustment

To see the Jacobian adjustment in action, let’s simulate data from the Exponential(1) distribution for X, compute Y = X², and plot the empirical histogram of Y against the theoretical pdf for increasing sample sizes n. As n grows, the histogram should converge to the correct adjusted pdf, not the naive one.

Code

import numpy as np
import matplotlib.pyplot as plt

def correct_pdf(y):
    return np.exp(-np.sqrt(y)) / (2 * np.sqrt(y))

def naive_pdf(y):
    return np.exp(-np.sqrt(y))

# Sample sizes to test
sample_sizes = [100, 1_000, 10_000]

fig, axs = plt.subplots(1, len(sample_sizes), figsize=(15, 5))

y_vals = np.linspace(0.01, 50, 1000)  # Range for plotting theoretical pdfs

for i, n in enumerate(sample_sizes):
    # Sample X ~ Exp(1)
    x = np.random.exponential(scale=1, size=n)
    y = x**2
    
    # Plot histogram (normalized to density)
    axs[i].hist(y, bins=50, range=(0, 50), density=True, alpha=0.6, color='skyblue', label='Empirical Histogram')
    
    # Plot theoretical pdfs
    axs[i].plot(y_vals, correct_pdf(y_vals), 'g-', label='Correct PDF (with Jacobian)')
    axs[i].plot(y_vals, naive_pdf(y_vals), 'r--', label='Naive PDF (no Jacobian)')
    
    axs[i].set_title(f'n = {n}')
    axs[i].set_xlabel('Y = X²')
    axs[i].set_ylabel('Density')
    axs[i].legend()
    axs[i].set_ylim(0, 0.5)  # For consistent viewing
    axs[i].grid(True)  # Add grid to each subplot

# Set the figure DPI to 250 for higher resolution
fig.set_dpi(250)

plt.grid()
plt.tight_layout()
plt.show()

And the result is what we expect.

let's simulate data from the Exponential(1) distribution for X, compute Y = X², and plot the empirical histogram of Y against the theoretical pdf for increasing sample sizes n. As n grows, the histogram should converge to the correct adjusted pdf, not the naive one. — A proof of the Jacobian adjustment: the green curve correctly fits the sampled data for y

Histogram Equalization: a real world application

A summary plot of histogram equalization from a post I wrote about it.

A classic example where the Jacobian adjustment appears naturally is histogram equalization in image processing.

We treat pixel intensities X (typically in $[0, 255]$) as samples from some distribution with empirical pdf based on the image histogram.

The goal is to transform them to new intensities Y so that Y is approximately uniform on $[0, 255]$ — this spreads out the values and improves contrast.

The transformation used is exactly the scaled cumulative distribution function (CDF) of X:

$$ Y = 255 \cdot F_X(X) $$

where $ F_X(x) = \int_{-\infty}^x f_X(t) \, dt $ (empirical CDF in practice).

Why does this work? It is a direct application of the Probability Integral Transform (PIT):

If $ Y = F_X(X) $ and X is continuous, then Y ~ Uniform$[0,1]$.

Scaling by 255 gives Uniform$[0,255]$.

Now see the Jacobian at work:

Let $ g(x) = L \cdot F_X(x) $ ($ L = 255 $).

The derivative $ g'(x) = L \cdot f_X(x) $ (since the derivative of the CDF is the pdf).

Apply the change-of-variables formula:

$$ f_Y(y) = f_X(x) / |g'(x)| = f_X(x) / (L f_X(x)) = 1/L $$

The $ f_X(x) $ cancels perfectly, leaving a constant (uniform) density.

The Jacobian factor $ 1 / |g'(x)| $ automatically flattens the distribution by compensating for regions where the original density was high or low.

In discrete images, rounding makes it approximate, but the principle is the same.

For a deeper dive into histogram equalization with examples, see my earlier post: here.

In Conclusion

The Jacobian adjustment is one of those quiet pieces of mathematics that feels unnecessary—until you skip it and suddenly your probabilities don’t add up to 1 anymore. Whether you’re squaring waiting times, modeling energy from speed, or flattening image histograms, the transformation changes not just the values but how probability is distributed across them. The factor $ \left| \frac{dx}{dy} \right| $ (or its multivariate cousin, the determinant) is the precise compensation that keeps the total probability conserved while accounting for local stretching and compression.

Next time you transform a random variable, remember the sand on the rubber sheet: warp the axis all you want, but the total sand must stay the same. The Jacobian Adjustment is the rule that makes it happen.