# Perfectly random sampling (2): Wilson's uniform spanning tree algorithm

**Example:**

**Requiremens:** `python>=2.7`

and `tqdm`

for showing process bar.

# Usage

Run any of the `example*.py`

scripts and wait for a few seconds, you will see a `.gif`

file generated in current directory, enjoy it!

This program is written in pure python: no third-party modules nor softwares are needed, just built-in modules like `struct, random, colorsys`

and some built-in functions (if you want to embed the animation into a background image then `PIL`

is required). It can make animations of all kinds of maze generation and maze solving algorithms on the 2D grid. It runs very fast and generates optimized GIF files in only a few seconds. I could write it faster by using `numpy`

arrays but I prefer to keep the code being "pure blooded".

**Update:** I have added the process bar feature in the latest version for which the `tqdm`

lib is required.

# History of this program

This program was motivated by Mike Bostock's wonderful Javascript animation. I learned Wilson's algorithm when I was a Ph.D student and had the idea of writing a python version to produce GIF animations of it the first sight when I saw Mike's page, but rendering a GIF image which possibly contains thousands of frames is definitely a formidable task. It was about five years later when I occasionally touched the GIF89a specification that finally realized the approach of encoding the whole process into a byte stream.

# What is Wilson's algorithm

Consider the following problem:

Problem:Let \(G\) be a finite, connected and undirected graph. How can we choose a random spanning tree among all spanning trees of \(G\) with equal probability? (we shall call such a tree anuniform spanning tree, or simply anUST.)

You might say "that's easy, just write a program that lists all spanning trees and then use a random integer to choose one". But let's consider complete graphs \(K_n\) for example: \(K_n\) has \(n^{n-2}\) many different spanning trees by Cayley's formula, for \(n=100\) this number is \(100^{98}\), far more larger than the number of particles in the universe! (which is estimated to be about \(10^{90}\).)

Currently the most efficient algorithm is the one proposed in Wilson's paper

"generating random spanning trees more quickly than the cover time".

It's a random algorithm, which means some times you may be very lucky to get an UST quickly, or you may wait forever. But it can be proven that this algorithm will terminate in finite steps with probability one (note this does not exclude the possibility of running forever, think about this), and it performs really well in most cases.

The key to understand Wilson's algorithm is the so called loop erased random walk, that is, once the random walk visits a vertex that already existed in its path, it immediately erases the loop formed by these two visits and continues the walk from this vertex. Just watch the gif animation if you don't understand this, it's obvious to see what "loop erased random walk" means from the animation.

The algorithm runs as follows:

Wilson's algorithm:

Choose any vertex \(v\) as the root, maintain a tree \(T\), initially \(T=\{v\}\).

For any vertex \(z\) that is not in \(T\), start a loop erased random walk from \(z\), until the walk hits \(T\), then add the resulting path of the walk to \(T\).

repeat step 2 until all vertices of the graph are in \(T\).

The proof of the correctness of this algorithm is a bit tricky and will not be discussed here, you may refer to Wilson's original paper or the book by Russell Lyons and Yuval Peres:

"Probability on Trees and Networks".

# How it works

As mentioned before, the animation may contain thousands of frames (it’s almost always this case when animating Wilson’s uniform spanning tree algorithm), so it’s quite surprising that our program costs only a few seconds and can produce highly optimized images. The key points are:

Find a way to reduce the file size. This is accomplished by maintaining a rectangular region that holds the size and position of current frame and allowing variable mimimum code length for the LZW compression.

Write the frames to the file as quickly as possible. This is accomplished by writing them to a in-memory io file first and then flush the data to disk all at once.

Of course you must know how the GIF89a specification works before you could truly understand the code, for this the best and possibly the only reference you will need is What's in a GIF.