Some examples of RJ-MCMC

3/29/2016

Model Selection with RJ-MCMC

(This was largely expanded from notes from Pierre Jacob at Harvard)

  • \( \left(\mathcal{M}_{m}\right)_{m\in\mathbb{N}} \): a collection of models
  • \( \theta_{m}\in \mathcal{H}_m \): associated model parameter and space
  • \( \mathcal{H} = \bigcup_{m\in\mathbb{N}}\{\mathcal{M}_{m}\}\times \mathcal{H}_m \): full parameter space

Some distributions we can choose:

  • \( p(\theta_{m}\mid\mathcal{M}_{m}) \): prior distribution
  • \( p\left(Y\mid\theta_{m},\mathcal{M}_{m}\right) \): likelihood
  • \( p\left(\mathcal{M}_{m}\right) \): prior on models

What we hope to target

The two posterior distributions.

The easy one:

\[ \pi\left(\theta_{m}\mid \mathcal{M}_{m}, Y\right) \propto p\left(Y \mid \theta_{m},\mathcal{M}_{m}\right)p\left(\theta_{m}\mid\mathcal{M}_{m}\right) \]

The hard one:

\[ \pi\left(\mathcal{M}_{m},\theta_{m}\mid Y\right) \propto p\left(Y \mid \theta_{m},\mathcal{M}_{m}\right)p\left(\theta_{m}\mid\mathcal{M}_{m}\right)p\left(\mathcal{M}_{m}\right) \]

We'll ignore the first one for today, we've seen lots of methods for this already. Also, denote the second posterior as: \[ \pi\left(m,\theta_{m}\right) \]

Between-Model Moves:

\( (m,\theta_{m})\in \mathcal{H} \)

  • So far \( \mathcal{H}_m \) is unrestricted (can have differing number of parameters)
  • MCMC chain we create for this pair will be moving in \( \mathcal{H} \)

Consider proposals of the form: \[ q(m\to m^{\prime})q_{m\to m^{\prime}}(\theta\to\theta^{\prime})d\theta^{\prime} \]

  1. propose changing models: \( q(m\to m^{\prime}) \)
  2. propose the parameters in the new model: \( \theta^{\prime}\in\mathcal{H}_{m^{\prime}} \)

Dimension matching

Propose \( m^{\prime} \) from \( q(m\to m^{\prime}) \)

\( \theta \) and \( \ \theta^{\prime} \) can be of different dimensions, use auxillary variables to match the dimensions \[ dim\left((\theta,u)\right) = dim\left((\theta^{\prime},u^{\prime})\right) \]

  • If increasing dimensionality, then \( u' \) could be empty
  • If decreasing dimensionality, then \( u \) could be empty

Overall idea is then to transform old variables \( (\theta,u) \) into new variables \( (\theta^{\prime},u^{\prime}) \)

Transformations

Auxillary variables are used to match dimensions, we can draw them from arbitraty distributions \[ u \sim \varphi_{m\to m^{\prime}}(\cdot) \ \text{and} \ u' \sim\varphi_{m^{\prime}\to m}(\cdot), \]

Recall acceptence probability. It involved a derivative of this transformation. So choose a nice one! \[ (\theta^{\prime},u^{\prime})=G_{m\to m^{\prime}}(\theta,u) \]

For example (Diffeomorphism):

  • differentiable in all coordinates
  • invertible
  • inverse also differentiable

Acceptence Probability

\[ \min\left(1,\frac{\pi(m^{\prime},\theta^{\prime})q(m^{\prime}\to m\text{)}\varphi_{m^{\prime}\to m}(u^{\prime})}{\pi(m,\theta)q(m\to m^{\prime})\varphi_{m\to m^{\prime}}(u)}\left\vert \frac{\partial G_{m\to m^{\prime}}(\theta,u)}{\partial(\theta,u)}\right\vert \right) \]

This is just the regular old Hastings acceptence probability but

  • with a slightly more complex two-stage proposal
    • propose a new model: \( m \to m^{\prime} \)
    • propose a way of mapping: \( \mathcal{H}_m \to \mathcal{H}_{m^{\prime}} \)
  • with a Jacobian to adjust for the change of parameter space

Things to consider

More things will play a role in the efficiency of this algorithm than in other MCMC methods

  • How you choose to transform your parameters, ie. choosing \( G_{m\to m^{\prime}} \)
  • How you propose changing models \( \ q(m\to m^{\prime}) \)
  • How you select auxillary variable \( \ \varphi_{m\to m^{\prime}}(u) \)

A concrete example

Want to fit the data \( (y_1, ..., y_n) \) to one of two models, either \[ y_i \sim Exp(\lambda)\\ y_i \sim Gamma(\alpha, \beta) \] Let's be Bayesian: \[ \lambda \sim Gamma(a_1, b_1)\\ \alpha \sim Gamma(a_2, b_2)\\ \beta \sim Gamma(a_3, b_3) \]

We can also put priors on our models:

  • \( p(Exp) \): prior belief that Exponential is the correct model
  • \( p(G) \): prior belief that Gamma is the correct model

Deriving some samplers within models