Perfectly random sampling (3): coupling from the past

Example:

Requirements: numpy and matplotlib.

This is our final project on "perfectly random sampling", the previous two projects on this topic are domimo shuffling algorithm on Aztec diamonds and Wilson's uniform spanning tree algorithm.

Usage

Just run python cftp.py.

Lozenge tilings, plane partitions and non-intersecting paths

Again let's consider a seemingly elementary problem on enumeration:

Problem 1: let \(H(a,b,c)\) be a hexagon on the hexagonal lattice with length of the sides \(a,b,c\) (\(a,b,c\) are positive integers):

How many different lozenge tilings of \(H(a,b,c)\) are there?

The following image shows a lozenge tiling of \(H(6, 6, 6)\):

The three kinds of lozenges are colored by different colors.

If we imagine we are viewing this tiling from a bird's eye in 3D, then it's much like piling some stacks of boxes at the corner in a room, the ceiling of the room is of height \(c\), and the floor is of size \(a\times b\). These stacks of boxes also satisfy certain descending rule: the height of inner stacks must be greater or equal than the height of outer stacks. To be precise, we can use a \(b\times a\) matrix \(A=(a_{ij})\) to represent these stacks of boxes: \(a_{ij}\) is the height of the stack at \((i,j)\). So the matrix for the tiling in the above image is \[A=\begin{matrix}6&6&6&5&4&2\\5&5&4&3&2&1\\2&2&2&2&2&1\\2&2&1&1&1&1\\2&1&1&1&1&1\\1&1&1&1\end{matrix}.\] (zeros are usually omitted by convention.)

The descending rule now translates to:

  1. \(a_{ij}\) is an integer and satisfies \(0\leq a_{ij}\leq c\) for all \((i, j)\).
  2. Each row of \(A\) is weakly decreasing from left to right: \(a_{ij}\geq a_{i,j+1}\) for all \(i\).
  3. Each column of \(A\) is weakly decreasing from top to bottom: \(a_{ij}\geq a_{i+1, j}\) for all \(j\).

Such a matrix \(A\) is called a boxed plane partition with parameters \((a, b, c)\), and the enumeration problem now becomes

Problem 2: How many boxed plane partitions with parameters \((a, b, c)\) are there? In other words, how many \(b\times a\) integral matrices \(A\) are there that satisfing \(0\leq a_{ij}\leq c\) and all rows and column of \(A\) are non-increasing sequences?

The answer is a quite neat and astonishing formula given by Macmahon: \[\prod_{i=1}^a\prod_{j=1}^b\prod_{k=1}^c\frac{i+j+k-1}{i+j+k-2},\] which suggests this problem should also have a beautiful and astonishing solution, and it does. It's called Gessel–Viennot lemma on non-intersecting paths. I will not repeat the proof here but I will explain the one-to-one correspondence between boxed plane partitions and non-intersecting paths. This correspondence becomes quite obvious once you have adapted the 3D view: each "layer" of the boxes correspondes to a lattice path, there are \(c\) such paths. See the image below.

These paths are in fact Gauss paths on the 2D square grid: just imagine look at the boxes directly from the above. Each path is a Gauss path from \((0, 0)\) to \((a, b)\), either move right or move up.

The descending rule of plane partitions requires that the path from a upper layer must lie on top of the path from a lower layer, they can have a "touch" but can not have a "cross". So if we translate the \(i-\)th path by \((-i, i)\), all these paths are disjoint with each other, this is called a system of non-intersecting paths:

Note since there are \(c\) layers of boxes, there are also \(c\) paths. In the above GIF image I also added two bounding paths (the topmost one and the bottommost one). These two paths are used as auxiliary paths in the coupling from the past algorithm for convenience, they really do not belong to the non-intersecting path system.

This correnspondence can be reversed: one can easily recover the plane partition (hence the lozenge tiling) if given such a non-intersecting path system, understanding this correspondence is pivotal to run the coupling from the past algorithm on lozenge tilings.

Now comes our question on perfectly random sampling of lozenge tilings:

Problem 3: how can we choose a random tiling of \(H(a,b,c)\) with uniform probability?

As Macmahon's formula suggests, the number of tilings of \(H(a,b,c)\) increases rapidly when \(a,b,c\) are large, hence we need to use a smart way to generate random tilings, the technique is called "coupling from the past" we implement in this program.

Coupling from the past

Suppose we are given a finite, ergodic Markov chain \(M\) with state space \(S\) and stationary distribution \(\pi\), how can we sample a random state of \(M\) from distribution \(\pi\)? The usual approach to this problem is to pick an arbitrary initial state \(s\in S\), and run the chain for many many iterations, then the distribution of the final state can be arbitrarily close to \(\pi\) if the number of iterations is large enough. The drawback of this approach is that in a general Markov chain, the number of iterations (mixing time) needed for approximating \(\pi\) within a given error bound is quite hard to determine, and the distribution of the final state is never exactly equal to \(\pi\).

Propp and Wilson proposed an algorithm called "coupling from the past" (cftp) to overcome this, the idea is that instead of running one chain forward, one runs \(|S|\) chains (one chain for each state) from the past, until all these chains meet together at time 0 (coupled), then this final state will be a sampling exactly from \(\pi\).

The precise formulation of the algorithm requires the random mapping representation of a Markov chain \(M\): suppose there is a fixed and deterministc \(f: S\times [0, 1]\to S\) such that for a random variable \(U\) uniformly distributed on \([0, 1]\), we have \[\mathbb{P}(f(x, U)=y) = p_{x,y},\] where \(p_{x,y}\) is the transition probability of \(x\) to \(y\), then we call \(f\) be the random mapping representation of \(M\). It can be proved that any finite Markov chain has such a representation.

If we have a single i.i.d sequence of \(U\)'s \((u_1,u_2,\ldots,u_n,\ldots)\), then we can use this sequence as the source of randomness to run the \(|S|\) realizations of the chain from the past and forward to time 0: choose an integer \(n>0\) and for each \(s\in S\), define \[x_s^{-n}=s, x_s^{-(n-1)}=f(x_s^{-n}, u_n),\cdots,x_s^{-1}=f(x_s^{-2}, u_2), x_s^0=f(x_s^{-1}, u_1).\]

The key obervations of Propp and Wilson are:

  1. If for two different chains start from \(s\) and \(s'\) are equal at some time \(-n\leq t\leq 0\): \(x_s^t=x_{s'}^t\), then they are coupled and will continue to move in lockstep for all subsequent iterations. If all \(|S|\) chains are coupled at some time \(-n\leq t\leq 0\) then the distribution of their common final state \(x_0\) is the stationaty distribution \(\pi\).

  2. If these chains do not couple, i.e. end at the same final state at time 0, then one simply look farther back in the past, say start from time \(-2n\), and re-use the same randomn sequence to run all the versions of chains.

Monotone coupling

When the state space \(S\) is large running all \(|S|\) chains and waiting for them to couple together may be very time-consuming. But if \(S\) has some kind of "monotone structure" then the algorithm can be simplified significantly: suppose there is a partial order \(\preceq\) on \(S\) such that

  1. If \(x\preceq y\) then for any \(u\in[0, 1]\) we have \(f(x,u)\preceq f(y,u)\). If this case we say \(f\) respects, or is compatible with the partial order \(\preceq\).
  2. There is a maximal element \(s_M\) and a minimal element \(s_m\), i.e. for all \(x\in S\) it holds \(s_m\preceq x\preceq s_M\).

Then one can simply run two versions of the chain, one for \(s_M\) and one for \(s_m\), since if these two chains are coupled together then they will also "squeeze" any other initial state into the same coupling.

This is exactly the case for lozenge tilings (hence boxed plane partitions): the partial order on plane partitions is clear the one defined by the height of the stacks of boxes, two plane patitions \(P_1\prec P_2\) if and only if for each stack in \(P_1\) its height is lower or equal than the height of the same stack in \(P_2\), or in other words, the boxes of \(P_1\) is a subset of the boxes of \(P_2\). The minimal element is the one contains no boxes and the maximal element is the one filled full of boxes.

What's the random mapping compatible with this partial ordering? It's defined as follows: for a plane partition \(P\), one chooses uniformly a random position in the room, and tries to add or remove a cube to \(P\) at this position (each with probability \(1/2\)), if the resulting configuration is not a valid plane partition then leave \(P\) alone. One can easily verify that this defines a Markov chain on the set of plane partitions (hence lozenge tilings), and this chain is irreducible, aperiodic and symmetric, hence its stationary distribution is the uniform one.

In the program I used non-intersecting paths to represent the boxed plane partitions, and the attemption of adding/removing a box at a position correspondes to pushing up/down a path at a chosen point.

References

  1. Markov chains and mixing times, chapter 22, by David A. Levin, Yuval Peres and Elizabeth L. Wilmer.

  2. Wilson's webpage.

  3. Blog post by Possibly Wrong, my code is based on the implementation there.

当前网速较慢或者你使用的浏览器不支持博客特定功能,请尝试刷新或换用Chrome、Firefox等现代浏览器