Probability integral transform

The probability integral transform states that, for a continuous random variable \(X\), the distribution of \(Y = F_X(X)\) is \(\text{Uniform}(0, 1)\). This result underlies inverse transform sampling. It illustrates why p-values are uniformly distributed under the null hypothesis. It is central to how copulas can model joint distributions. But why does this make sense?

Suppose we have a random variable \(X\) from an arbitrary probability distribution.

What does \(Y = F_X(x)\) look like?

Let’s try to draw the pdf of \(Y\) one section at a time.

First, suppose we select the top 3% of the distribution, (i.e. values between the \(0.97\) and \(1\) -quantiles of this distribution.)

The orange lines bound the top 3%.

For now, since we don’t know what the density of \(Y = F_X(X)\) looks like, let’s say it’s an arbitrary curve.

However, recall that we selected the top 3% of the probability mass, so within the orange interval, the area under the curve must be \(0.03\), and therefore the value of the pdf must be \(1\) on average within the orange interval.

Now, think about the region between the \(0.97\) and \(0.98\)-quantiles of the distribution. By definition, this comprises 1% of the probability mass (\(0.98 - 0.97 = 0.01\)), so we need to adjust our curve to satisfy this condition.

However, we note that all intervals have this same property (even arbitrarily small intervals): the width of each interval is equal to its corresponding probability mass. So, the pdf of \(Y\) needs to have mean \(1\) over any sub-interval of \([0, 1]\), no matter its size or location.

It is natural for me to suspect the pdf of \(Y\) to be a horizontal line at \(1\): this is the only function I can think of that guarantees this property.

We use the above insight to illustrate the theorem.

Theorem (Probability Integral Transform): \(Y = F_X(X) \sim \text{Uniform}(0, 1)\).

Proof: the standard proof of the PIT found on the Wikipedia page.

\[ \begin{align} F_Y(y) &= P(Y \leq y)\\ &= P(F_X(X) \leq y) && \text{(substituted the definition of } Y)\\ &= P(X \leq F_X^{-1}(y)) && \text{(applied } F_X^{-1} \text{ to both sides)}\\ &= F_X(F_X^{-1}(y)) && \text{(the definition of a CDF)}\\ &= y \end{align} \]

Therefore, \(Y \sim \text{Uniform}(0, 1)\).

P-value distribution under \(H_0\) ¹

The p-value of a test statistic \(T(X)\) for a one-sided test where the alternative “is greater than” is \(P_{H_0}(T \geq t(x))\). We can already see that this looks like a CDF (the “less than” alternative really is just a CDF), so our above insights will hold.

Define \(P_{greater} := \Pr_{H_0}(T \geq t(x)) = 1 - F_{T; H_0}(T)\).

\[ \begin{align} F_{P_{\text{greater}}}(p) &= \Pr(P_{greater} \leq p) && \text{(definition of a CDF)}\\ &= \Pr((1 - F_{T; H_0}(T)) \leq p)\\ &= \Pr(-F_{T; H_0}(T) \leq (p - 1))\\ &= \Pr(F_{T; H_0}(T) \geq (1 - p))\\ &= 1 - \Pr(F_{T; H_0}(T) \leq (1 - p))\\ &= 1 - \Pr(T \leq F_{T; H_0}^{-1}(1 - p)) && \text{(applied } F_X^{-1} \text{ to both sides)}\\ &= 1 - F_{T; H_0}(F_{T; H_0}^{-1}(1 - p)) && \text{(definition of a CDF)}\\ &= 1 - (1 - p)\\ &= p\\ &= F_{U(0, 1)}(p) \end{align} \]

Thus, we have shown that one-sided p-values are uniformly distributed under the null hypothesis.²

Acknowledgements

Thank you to Meimingwei Li, Raphael Rehms, Dr. Fabian Scheipl, Prof. Michael Schomaker, and J.P. Weideman for their helpful input.

Footnotes

Notation and proof from and inspired by Raphael Rehms’s exercise and solution from the Statistical Methods in Epidemiology course for master’s students at LMU Munich.↩︎
This holds only for divergence p-values, not decision p-values. My understanding is that divergence p-values are one-sided p-values. Thanks to Prof. Schomaker for this insight.↩︎

P-value distribution under \(H_0\) 1

Acknowledgements

Footnotes

P-value distribution under \(H_0\) ¹