8. Grid approximation and statistical inference#

A Bayesian model is a machine that takes as input the prior distribution and the likelihood and, using Bayes rule as an engine, generates the posterior distribution. However, knowing the mathematical rule is usually of little help. Restricting ourselves to those models that allow a mathematical manipulation is a Procrustean solution.

To solve this problem, we need to use nnumerical techniques that approximate the mathematical manipulation.

Procrustean solution

In Greek mythology, Procrustes was a rogue smith and bandit from Attica who attacked people by stretching them or cutting off their legs, so as to force them to fit the size of an iron bed.

The word Procrustean is thus used to describe situations where an arbitrary standard is used to measure success, while completely disregarding obvious harm that results from the effort.

See also

You can read more about Procrustes and Procrustean solutions in this Wikipedia article.

8.1. Grid approximation#

A simple solution when there are few parameters (tipically one or two) consists in generate a grid of values for them. Let be \(\theta_j\) one of such values, then we can calculate the posterior distribution at \(\theta_j\) (except for some constant of proportionality) using the formula

\[p(\theta_j|\mathbf{Y})\propto p(\mathbf{Y}|\theta_j)p(\theta_j).\]

An important consequence of this fact is that we can generate a sample from the posterior distribution from the proposed grid of values, we simply need to select \(\theta_j\) proportionally to \(p(\mathbf{Y}|\theta_j)p(\theta_j)\). To get a good approximation of the posterior we need two conditions, the first one is that we need a thin grid, the second one is that we need to get a large sample from this grid.

8.2. Hypothesis test#

Sometimes the statistical inference can be described as:

  1. We have a hypothesis, which might be true or false (\(H:\theta\in\Theta_1\)).

  2. We get statistical evidence about the falsifiability of the hypothesis.

  3. We use (or should use) Bayes rule to deduce logically the impact of the evident in the hypothesis

\[\mathbb{P}(H|\mathbf{Y})=\mathbb{P}(\theta\in\Theta_1|\mathbf{Y})=\int_{\Theta_1}p(\theta|\mathbf{Y})d\theta.\]

Hypothesis test

Generally speaking, in a hypothesis test, we want to calculate

\[\mathbb{P}(H|\text{evidence})=\frac{\mathbb{P}(\text{evidence}|H)\mathbb{P}(H)}{\mathbb{P}(\text{evidence})}.\]

To increase the posterior probability, it is highly important to increase \(\mathbb{P}(H)\), this requires a cognitive and argumentative effort, and it is not limited to a simple statistical test. Statistics is not a substitute of argumentation and science.

In the code 08_LindleysParadox.ipynb within the repository of the course, we explore Lindley’s paradox to compare the concept hypothesis test from both the frequentist and the Bayesian frameworks.

8.3. Punctual estimation#

The Bayesian estimator consists of the whole posterior distribution. However, somtimes we need to report a punctual value of the parameters. In this case is common to report the MAP. Unfortunately, this estimator might arrise absurd results. For example I need to add the example of estimating the proportion of water on the Earth surface. Thus, instead of reporting the MAP other popular options are the posterior mean and the posterior median. Another option is to define a loss function and find the estimator that minimizes it. For instance, the mean minimizes the quadratic loss, while the median minimizes the absolute loss.

8.4. Region and interval estimation#

If we count with a sample from the (approximate) posterior distribution \(\tilde\theta_1,\ldots,\tilde\theta_S\), we can estimate \(\mathbb{P}(\theta\in\Theta_1|\mathbf{Y})=\mathbb{E}\left[1_{\theta\in\Theta_1}|\mathbf{Y}\right]\) through

\[\frac{1}{S}\sum_{s=1}^S 1_{\tilde\theta_s\in\Theta_1},\]

where \(\Theta_1\) is (in principle) any arbitrary region in the parametric space.

In this way, it becomes particularly easy to estimate intervals \((\theta_1, \theta_2)\) such that \(\mathbb{P}(\theta\in(\theta_1,\theta_2))=1-\alpha\). This intervals are known as credible intervals. The shorter intervals of posterior probability of \(1-\alpha\) are of particular interests, such intervals are known as highest posterior density interval, HPDI.

Sometimes is also reported the equal-tailed interval (ETI) as a robust alternative. This interval exclude \(\alpha/2\) probability from each tail of the distribution and always include the median.

8.5. Simulate from the posterior predictive distribution#

We could also be interested in making inference about the variable of interest \(Y\). To do so, we can simulate a sample from the posterior predictive distribution in the following way. Once we count with a sample from the posterior distribution, \(\tilde\theta_1,\ldots,\tilde\theta_S\), te generate the sample \(\tilde{Y}_1,\ldots, \tilde{Y}_S\) from the posterior predictive distribution we simply simulate \(\tilde{Y}_s\sim p(Y|\tilde\theta_s)\).

In the code 09_BetaBinomialGrid.ipynb within the repository of the course, we explore the grid approximation for the Beta-Binomial model, and show how to use it to make statistical inferences.

8.6. Simulate from the prior predictive distribution#

We can simulate a sample from the prior predictive distribution as well. To do so, we just need to simulate a sample of the parameters, \(\tilde\theta_1,\ldots,\tilde\theta_S\), from the prior distribution and then simulate \(\tilde{Y}_s\sim p(Y|\tilde\theta_s)\).

Simulate from the prior predictive can help us to discriminate between distinct prior distributions. When we simulate from the prior predictive, we can observe the effect of the prior distribution on the variable of interest \(Y\). Many of the conventional techniques to specify the prior distribution could generate absurd results.