Trans-dimensional change of variables and the integral Jacobian

09 Aug 2015

I’ve been trying to solidify my understanding of manipulating random variables (RVs), particularly transforming easy-to-use RVs into more structurally interesting RVs. This note will go over the standard univariate change of variables method, and then expand on probability density functions for more complicated maps using the integral Jacobian.

Let’s refresh our memory on the change of variables formula - let’s say we have a simple univariate distribution \(x \sim p_x\) whose density, \(p_x(\cdot)\), we can compute and from which we can draw samples. But what we’re really interested in is the distribution of some transformation of \(x\), \(y = f(x)\). What is the probability density of \(y\), \(p_y(y)\)?

This pdf can be derived using the familiar change of variables method. At its heart, the change of variables derivation starts with volume preservation:

\[\begin{align} \int_{y \in f(A)} p_y(y)dy = \int_{x \in A} p_x(x)dx \, . \end{align}\]

That is, what we care about is ensuring that the probability of some set \(A \in \mathcal{X}\) computed using the density \(p_x(x)\) is exactly the same as the probability of the transformed set, \(f(A) \in \mathcal{Y}\) computed with \(p_y(y)\).

If \(f\) is one-to-one, then we can derive \(p_y(y)\) by equating

\[\begin{align} p_y(y)dy &= p_x(x)dx \\ \implies p_y(y) &= p_x(x)\left|\frac{dx}{dy}\right| \\ &= p_x(x)\left|\frac{df^{-1}(y)}{dy}\right| \\ \end{align}\]

My intuition is that in order to obtain a valid density \(p_y(y)\) (that we can integrate with respect to infinitesimals \(dy\)), we need to take the function \(p_x(x)\) (which is integrable against infinitesimals \(dx\)) and slow it down in a sense. We need to rescale it by the ratio of how quickly \(y\) grows compared to \(x\) - if \(y\) grows twice as quickly as \(x\) then we need to slow it down somehow to integrate against the density \(p_x(x)\) with respect to \(y\).

If I ever forget how to keep the direction of the transformation \(f\) straight, I think of a uniform random variable, \(x \sim Unif(0,1)\) and the simple rescaling \(y = f(x) = a \cdot x\). This simple demo highlights the idea of volume preservation for the rescaled uniform:

show transformed distribution

Intuitively it’s obvious what the resulting pdf of \(y\) should be - it should be constant between \(0\) and \(a\). Furthermore, \(a\) is stretching (or squeezing) the range of \(y\) so the unnormalized distribution \(\tilde p_y(y) = 1\) needs to be rescaled to reflect this. The rescaling is pretty clear: \(p_y(y) = \frac{1}{a}\), which matches the derivative of \(f^{-1}(y) = \frac{y}{a}\) rescaling.

Trans-dimensional mappings

Now let’s imagine the situation where we have a lot of random variables, \(x_1, \dots, x_K \sim p_x\), from an easy-to-manipulate distribution, and we want to reason about the distribution of \(y = f(x_1, \dots, x_n)\) a scalar-valued function. How can we use our inverse Jacobian trick from the univariate change of variables?

The key is to create our own one-to-one map from \(\mathcal{X}^K\) to a convenient \(K\) dimensional space that is easy to manipulate. For a scalar valued function, such a transformation might be

\[\begin{align} \boldsymbol{f}(\boldsymbol{x}) = \boldsymbol{y} &= \left( f(\boldsymbol{x}), x_2, \dots, x_K \right) \\ &\equiv (y, y_2, \dots, y_K) \end{align}\]

Then we can use the multivariate change of variables formula

\[\begin{align} p_{\boldsymbol{y}}( \boldsymbol{y} ) &= p_{\boldsymbol{x}}(\boldsymbol{x}) \left| J(\boldsymbol{f}^{-1}) \right| \end{align}\]

(where \(J(\boldsymbol{f}^{-1})\) is the determinant of the Jacobian matrix of the inverse one-to-one map. Now we have a distribution over the vector \(\boldsymbol{y}\) that admits the distribution over the RV that we care about, \(f(\boldsymbol{x})\), as its first marginal. So we can marginalize out the extra dimensions

\[\begin{align} p_y(y) &= \int p_{\boldsymbol{x}}(\boldsymbol{x}) \left| J(\boldsymbol{f}^{-1}) \right| dy_2 \dots dy_K \end{align}\]

We note that if we can find some function \(g(y)\) such that

\[\begin{align} g(y) = g(f(\boldsymbol{x})) = p_{\boldsymbol{x}}(\boldsymbol{x}) \end{align}\]

(and not as a function of \(y_2, y_3, \dots, y_K\), then we can re-write the marginal distribution over \(y\) as \begin{align} p_y(y) &= g(y) \int \left|J(\boldsymbol{f}^{-1}) \right| dy_2 \dots dy_K \end{align} where the righthand term (under the integral) is called the integral Jacobian. This essentially plays the volume correction role for trans-dimensional transformations.

####Example: Sum of Exponentials

To solidify our understanding, let’s prove a well-known identity. Say we have a collection of independent draws from an exponential distribution with mean \(\lambda\)

\[\begin{align} X_1, \dots, X_K &\sim \text{Expo}(\lambda) \end{align}\]

Our goal is to use the integral Jacobian method to derive the pdf of a Gamma distribution with convolution parameter \(K\) and scale parameter \(\lambda\) That is, we already know that

\[\begin{align} \sum_k X_k \sim \text{Gamma}(k, \lambda) \end{align}\]

so we’ll derive the pdf of a Gamma by using the integral Jacobian change of variables method with the many-to-one map \(y \equiv f(X_1, \dots, X_K) = \sum_k X_K\)

Let’s define the one-to-one map

\[\begin{align} \boldsymbol{f}(X_1, \dots, X_K) &= \left( \sum_k X_k, X_2, \dots, X_K \right) \end{align}\]

and note its inverse

\[\begin{align} \boldsymbol{f}^{-1}(y, y_2, \dots, y_K) &= \left( y - \sum_{k\neq1} y_k, y_2, \dots, y_K \right) \end{align}\]

We know that the pdf of an exponential is \(p_x(x) = \lambda e^{-\lambda x}\) so we can write the distribution of interest as

\[\begin{align} p_y(y) &= \int_{\tilde{\mathcal{Y}}} p_x(x_1)\dots p_x(x_K) \left| J(\boldsymbol{f}^{-1}) \right| dy_2 \dots dy_K \\ &= g(y) \int_{\tilde{\mathcal{Y}}} \left| J(\boldsymbol{f}^{-1}) \right| dy_2 \dots dy_K \end{align}\]

where

\[\begin{align} g(y) &= \lambda^{K} e^{-\lambda \sum_k x_k} \\ &= \lambda^{K} e^{-\lambda y} \end{align}\]

and we note that the region of integration is constrained

\[\begin{align} \tilde{\mathcal{Y}} &= \{ (y_2, \dots, y_K) : \sum_{k=2}^K y_k < y \} \end{align}\]

So a lot of the action is in the integral Jacobian. We can compute the determinant of this by noting that the structure of the Jacobian matrix is

\[\begin{align} J(f^{-1}) &= \begin{pmatrix} \frac{df^{-1}_1}{dy} & \frac{df^{-1}_1}{dy_2} & \dots & \frac{df^{-1}_1}{dy_K} \\ \frac{df^{-1}_2}{dy} & \\ \dots & \\ \frac{df^{-1}_K}{dy} & \dots & \dots & \frac{df^{-1}_K}{dy_K} \end{pmatrix} = \begin{pmatrix} 1 & -1 & \dots & -1 \\ 0 & 1 & \dots & 0 \\ 0 & \dots & & 1 \end{pmatrix} = I + u^\intercal v \end{align}\]

where \(u = (1, 0,\dots, 0)\) and \(v = (0, -1, \dots, -1)\) By the matrix determinant lemma, we can see that the determinant is 1. Intuitively, the Jacobian determinant is one - none of the dimensions are stretched or skewed - there is just a translation of the first dimension, so the Jacobian is volume preserving.

Now, plugging it in

\[\begin{align} p_y(y) &= g(y) \int_{\tilde{\mathcal{Y}}} dy_2 \dots dy_K \end{align}\]

we see that the integral is just the volume of the set \(\tilde{\mathcal{Y}}\) This set is familiar - it’s a \(K-1\) simplex that’s been scaled by \(y\). The volume of this set is

\[\begin{align} \int_{\tilde{\mathcal{Y}}} dy_2\dots dy_K &= \frac{y^{K-1}}{(K-1)!} \end{align}\]

Giving us the final form of the Gamma pdf

\[\begin{align} p_y(y) &= \lambda^K e^{-\lambda y} \frac{y^{K-1}}{(K-1)!} \end{align}\]

where the final “!” is both the factorial operator and meant to convey my excitement for deriving the correct expression.

Thanks to Alex Franks and Alex D’Amour for introducing me to to this concept and discussing it with me.