朝花夕拾

轻飘飘的旧时光就这么溜走 转头回去看看时已匆匆数年

Coupling from the Past and Lozenge Tilings


This is our final project on perfectly random sampling, the previous two projects on this topic are domimo shuffling algorithm on Aztec diamonds and Wilson's uniform spanning tree algorithm.

Requirements: tqdm and cairocffi.

Usage: just run python main.py.

Lozenge tilings, plane partitions and non-intersecting paths

Again let's consider a seemingly elementary problem on enumeration:

Problem 1: Let \(H(a, b, c)\) be a hexagon on the hexagonal lattice with side lengths \(a,b,c\) where \(a,b,c\) are all positive integers:

Our questions is: how many different lozenge tilings of \(H(a,b,c)\) are there?

The following image shows a lozenge tiling of \(H(6, 6, 6)\): The three kinds of lozenges are colored by different colors.

If we imagine we are viewing this tiling from a bird's eye in 3D, then it's much like piling some stacks of boxes at the corner in a room, the ceiling of the room is of height \(c\), and the floor is of size \(a\times b\). These stacks of boxes also satisfy certain descending rule: the height of an inner stack must be greater or equal than the height of an outer stack. To be precise, we can use a \(b\times a\) matrix \(A=(a_{ij})\) to represent these stacks: \(a_{ij}\) is the height of the stack at \((i,j)\). So the matrix for the tiling in the above image is

\[A=\begin{matrix}6&6&6&5&4&2\\5&5&4&3&2&1\\2&2&2&2&2&1\\2&2&1&1&1&1\\2&1&1&1&1&1\\1&1&1&1\end{matrix}.\]

(zeros are usually omitted by convention.)

The descending rule now translates to:

  1. \(a_{ij}\) is an integer and satisfies \(0\leq a_{ij}\leq c\) for all \((i, j)\).
  2. Each row of \(A\) is weakly decreasing from left to right: \(a_{ij}\geq a_{i,j+1}\) for all \(i\).
  3. Each column of \(A\) is weakly decreasing from top to bottom: \(a_{ij}\geq a_{i+1, j}\) for all \(j\).

Such a matrix \(A\) is called a boxed plane partition with parameters \((a,b,c)\) and the enumeration problem now becomes:

Problem 2: How many boxed plane partitions with parameters \((a,b,c)\) are there? In other words, how many \(b\times a\) integral matrices \(A\) are there that satisfing \(0\leq a_{ij}\leq c\) and all rows and column of \(A\) are non-increasing sequences?

The answer is a quite neat and astonishing formula given by Macmahon: \[\prod_{i=1}^a\prod_{j=1}^b\prod_{k=1}^c\frac{i+j+k-1}{i+j+k-2},\] which suggests this problem should also have a beautiful and astonishing solution, and it does. It's called Gessel–Viennot lemma on non-intersecting paths. I'm not going to repeat the proof here but I will explain the one-to-one correspondence between boxed plane partitions and non-intersecting paths. This correspondence becomes quite obvious once you have adapted the 3D view: each "layer" of the boxes correspondes to a lattice path, there are \(c\) such paths. See the image below:

These paths are in fact Gauss paths on an 2D square grid: just imagine look at the boxes directly from the above. Each path is a Gauss path from \((0, 0)\) to \((a,b)\), in each step they either move right or move up.

The descending rule of plane partitions requires that the path from a upper layer must lie on top of the path from a lower layer, they can have a "touch" but can not have a "cross". So if we translate the \(i\)-th path by \((-i,i)\), all these paths are disjoint with each other, this is called a system of non-intersecting paths:

Note since there are \(c\) layers of boxes, there are also \(c\) paths. In the above GIF image I also added two bounding paths (the topmost one and the bottommost one). These two paths are used as auxiliary paths in the code for convenience, they really do not belong to the path system.

This correnspondence can be reversed: one can easily recover the plane partition (hence the lozenge tiling) if given such a non-intersecting path system, understanding this correspondence is pivotal to run the coupling from the past algorithm on lozenge tilings.

Now comes our question on perfectly random sampling of lozenge tilings:

Problem 3: how can we choose a random tiling of \(H(a,b,c)\) with uniform probability?

As Macmahon's formula suggests, the number of tilings of \(H(a,b,c)\) increases rapidly when \(a,b,c\) are large, hence we need to figure out a smart way to generate random tilings, this technique is called "coupling from the past" proposed by Propp and Wilson.

Coupling from the past

Suppose we are given a finite, ergodic Markov chain \(M\) with state space \(S\) and stationary distribution \(\pi\), how can we sample a random state of \(M\) from distribution \(\pi\)? The usual approach is to pick an arbitrary initial state \(s\in S\) and run the chain for many many iterations, the distribution of the final state can be arbitrarily close to \(\pi\) if the number of iterations is large enough. The drawback of this approach is that in a general Markov chain, the number of iterations (mixing time) needed for approximating \(\pi\) within a given error bound can be quite hard to determine, and the distribution of the final state is never exactly equal to \(\pi\).

Propp and Wilson proposed an algorithm called "coupling from the past" (cftp) to overcome this, the idea is that instead of running one chain forward, one runs \(|S|\) chains (one chain for each state in \(S\)) from the past, until all chains meet together at time 0 (coupled), then this final state will be a perfectly sampling exactly from \(\pi\).

The precise formulation of the algorithm requires the random mapping representation of a Markov chain: suppose there is a fixed and deterministc function \(f: S\times [0, 1]\to S\) such that for a random variable \(U\) uniformly distributed on \([0, 1]\) we have \[\mathbb{P}(f(x, U)=y) = p_{x,y},\] where \(p_{x,y}\) is the transition probability of \(x\) to \(y\), then we call \(f\) the random mapping representation of \(M\). It can be prove that any finite Markov chain has at lease one such representation.

If we have an i.i.d sequence of \(U\)'s \((u_1,u_2,\ldots,u_n,\ldots)\) then we can use this sequence as the source of randomness to run the \(|S|\) realizations of the chain from the past: choose an integer \(n>0\) and for each \(s\in S\), define \[x_s^{-n}=s, x_s^{-(n-1)}=f(x_s^{-n}, u_n),\cdots,x_s^{-1}=f(x_s^{-2}, u_2), x_s^0=f(x_s^{-1}, u_1).\] The key obervations of Propp and Wilson are:

  1. If two different chains start from \(s\) and \(s'\) are equal at some time \(-n\leq t\leq 0: x_s^t=x_{s'}^t\), then they are coupled and will continue to move in lockstep for all subsequent iterations. If all \(|S|\) chains are coupled at some time \(-n\leq t\leq 0\) then the distribution of their common final state \(x_0\) is the stationaty distribution \(\pi\).
  2. If these chains do not couple, i.e. do not end at the same final state at time 0, then one simply look farther back in the past, say start from time \(-2n\), and reuse the same randomn sequence to run all the versions of chains.

Monotone coupling

When the state space \(S\) is large running all \(|S|\) chains and waiting for them to couple together may be very time-consuming. But if \(S\) has some kind of "monotone structure" then the algorithm can be simplified significantly: suppose there is a partial order \(\preceq\) on \(S\) such that:

  1. If \(x\preceq y\) then for any \(u\in[0,1]\) we have \(f(x,u)\preceq f(y,u)\). In this case we say \(f\) respects, or is compatible with the partial order \(\preceq\).
  2. There is a maximal element \(s_M\) and a minimal element \(s_m\), i.e. for all \(x\in S\) it holds \(s_m\preceq x\preceq s_M\). Then one can simply run two versions of the chain, one for \(s_M\) and one for \(s_m\), since if these two chains are coupled together they will also "squeeze" any other initial state into the same coupling.

This is exactly the case for boxed plane partitions (hence also for lozenge tilings): the partial order on plane partitions is clearly the one defined by the height of the stacks of boxes, two plane patitions \(P_1\preceq P_2\) if and only if for each stack in \(P_1\) its height is lower or equal than the height of the same stack in \(P_2\), or in other words, the boxes of \(P_1\) is a subset of the boxes of \(P_2\). The minimal element is the one contains no boxes and the maximal element is the one filled full with boxes.

What's the random mapping compatible with this partial ordering? It's defined as follows: for a plane partition \(P\), one chooses uniformly a random position in the room, and tries to add or remove a cube to \(P\) at this position (each with probability 1/2), if the resulting configuration is not a valid plane partition then leave \(P\) alone. One can easily verify that this defines a Markov chain on the set of plane partitions (hence lozenge tilings) and this chain is irreducible, aperiodic and symmetric, hence its stationary distribution is the uniform one.

About the code

In the program lozenge tilings are represented by non-intersecting paths in the tilted \(ac\)- plane: a tiling is encoded by a set of \(c\) non-intersecting paths \(P=\{p_1, p_2, \ldots, p_c\}\), where the \(i\)-th path \(p_i\) starts at \((0, i)\), either moves up or moves down in each step and ends at \((a+b, b+i)\). The attemption of adding/removing a box at a position correspondes to pushing up/down a path at a chosen point.

        /\
    |  /  \
    | /    \
    |/      \
 y-axis      |
    |   /    |
  c |  /     |
    | / b    |
    |/       |
     \      /
    a \    /
       \  /
        \/
      x-axis

In this coordinate system, the x-axis is in the \(a\)-direction and the y-axis is in the \(c\)-direction. For a path \(p\), in each step \(p\) either moves up in the \(b\)-direction or moves down in the \(a\)-direction. If it moves up then the \(x\) and \(y\) coordinates are both increased by 1, else it moves down and only the \(x\) coordinate is increased by 1.

In our notation, \(P[k][j]\) is the \(y\)-coordinate of the \(j\)-th point in the \(k\)-th path.

A partial order \(\preceq\) can be defined on the set of path systems: for two path systems \(P\) and \(Q\), \(P\preceq Q\) if and only if for all \(k\) and \(j\) it holds that \(P[k][j] <= Q[k][j]\). The maximal path system in this ordering is the one with all its paths firstly move up \(b\) steps and then move down \(a\) steps, whereas the minimal path system is the one with all its paths firstly move down \(a\) steps and then move up \(b\) steps.

The update rule for this Markov chain is:

  1. Choose a uniformly random interior point in \(P\), let this point be the \(j\)-th point in the \(k\)-th path.
  2. Try to update \(P\) at \(P[k][j]\): with probability 1/2 we try to "push up" \(P\) at \(P[k][j]\) (increase \(P[k][j]\) by 1) and with probability 1/2 we try to "push down" \(P\) at \(P[k][j]\) (decrease \(P[k][j]\) by 1). If after this operation \(P\) still represents a valid tiling, i.e. being non-intersecting, then we succeeded and got a new path system, otherwise we leave \(P\) remain untouched.