The Critters Go Board Generator
Michael Wing
January 15, 2008

	Ive wanted to develop a fast go board for many years, and have spent the last few months helping write a go board generator to figure out what fast means. The current code was inspired in part by discussions on the computer-go email.

	CGBG generates many go board implementations: 4 methods of tracking liberties, 3 methods of tracking adjacent groups, 2 methods for mapping a location to a chain. It also can generate code with and without borders, and for many board sizes. It generates ptest, which runs 2 performance tests: filling the board to 90% full (which resembles Monte Carlo readouts) and ladder reading (which resembles local board analysis). It generates ftest, which runs many functional unit tests. 

	Caveats. CGBG was intended to create apples-to-apples comparisons, but does not strictly do so. First, the current implementations are solid, but not necessarily optimal. For example, union-find uses classic 2-pass path compression, rather than the 1-pass version in libego. Also, Many other optimizations (loop unrolling) are not implemented. This is alpha code. Many ideas are just sketches. There is a lot of cruft.

Main conclusions:
	* GnuGo data structures are pretty darn good, even for Monte Carlo simulation. I had hoped to find a linked-list data structure, but didnt.
	* Behavior at 9*9 can be very different than behavior at 19*19.
	* The complex nature of cache and processor architecture makes it very hard to predict the effect of any data structure decision. Trying it out empirically is the only way.
	* Many reasonable implementations run within a factor or 33% or so of the best.

Main conclusions for 19*19:
	* Using union-find is slower than changing the smaller chain, by a tiny amount.
	* Using arrays to track liberties is best for board analysis.
	* Using pseudo-liberties only is best for pure simulation.
	* Using arrays to track adjacent groups is best for board analysis.
	* Using search-only to track adjacent groups is best for pure simulation.
	* For a program that does both Monte Carlo and local board analysis, I would use arrays to track both liberties and adjacencies.
	* For a program that does no local analysis, I would use pseudo liberties only.

Other Notes
	* CGBG precomputes board and hash data, so they can be stored in read-only memory and do not need to be initialized. This can be used as an example, or cut and paste into other implementations.
	* This implements positional superko checking (though not tested), which GnuGo and libego do not. Situational superko checking is sketched, but not complete.
	* The play() routine uses gotos. The conventional technique of deciding the status of a move and then using a switch statement is about 1 percent slower.

The code is written in Ruby, and currently generates C code for gcc under cygwin. It also needs to link in SFMT. It is available at www.swcp.com/~wing/cgbg


To Use
	* ./cgbg <config_file>
	* make
	* ./ftest
	* ./ptest

I would appreciate any feedback.
Michael Wing

---------------------------

This is the result of instrumenting the code:

play        calls    =   3292082
normal_move calls    =   3240000
main loop neighbors  =  12291074
main loop empty      =   6798595
main loop same color =   2804947
main loop diff color =   2687532
liberties add call   =  14548081
liberties add real   =  10736179
adjacencies add call =   5375064
adjacencies add real =   3964336
main loop merge      =   2524434
main loop capture    =    163120
liberty or adjacency 0
	0 => 4793839
	1 => 3885479
	2 => 2935332
	3 => 1940006
	4 => 1072253
	5 => 810316
	6 => 532638
	7 => 368774
	8 => 253656
	9 => 176892
	10 => 126672
	11 => 90491
	12 => 66060
	13 => 48008
	14 => 34793
	15 => 25707
	16 => 19055
	17 => 14008
	18 => 10213
	19 => 7589
	20 => 5531
	21 => 4104
	22 => 2967
	23 => 2184
	24 => 1607
	25 => 1177
	26 => 865
	27 => 584
	28 => 401
	29 => 302
	30 => 183
	31 => 142
	32 => 116
	33 => 85
	34 => 48
	35 => 32
	36 => 18
	37 => 8
	38 => 13
	39 => 2
	40 => 3
	41 => 2
	42 => 1
	43 => 0
	44 => 0
	45 => 0
	46 => 0
	47 => 0
	48 => 0
	49 => 0

* Making lib and adj sets dense in first 4 elements covers 83% of cases
  * There are 14.0m inserts into sets with 0-4 present values
  * There are  2.5m inserts into sets with 5+ present values
* Making lib and adj sets dense in first 8 elements covers 97% of cases
  * There are 16.3m inserts into sets with 0-8 present values
  * There are  0.5m inserts into sets with 9+ present values
 
* Basic observations:
  * 1/2 of all neighbor loops have empty neighbor
  * 1/4 of all neighbor loops have same color
  * 1/4 of all neighbor loops have different color

  * 1.3% of all neighbor loops have capture
  * 96% of all diff color loops have a merge

  * 2/3 to 3/4 of all inserts are real

* Basic conclusions:
  * Merge needs all the work!