A go database program
Currently I am working on a hashing algorithm for center patterns for libkombilo. See also this related discussion on Sensei's about this topic. Here I just want to share some observations (there are similar observations on the Sensei's page, but based on smaller database). The numbers given below are based on a database of around 50.000 high-dan games from KGS.

I extracted all 4x4 patterns from these games which are "in the center" (i.e. do not touch one of the sides of the board). There are 316 = 43046721 possible patterns. (Taking symmetry and illegal into account, this number reduces drastically). In the database, 4912691 patterns actually occur. Among those, 1489238 occur in exactly one game, and 3252710 in five or less games. On the other hand, 6903 patterns occur in more than 1.000 games, and 613 occur in more than 10.000 games. Of course, the most frequent pattern is the "empty pattern" which occurs in 14x14=196 spots in every non-handicap game.
It seems reasonable to consider in the libkombilo algorithm all patterns that occur in more than 50 and less than 5000 games - that would be 199625 patterns, and 38828416 "instances" (where by instance I mean one occurrence of such a pattern in a specific game). This should cover all patterns which are relevant for "every-day searches", and searching for those patterns will be very fast.
The figure shows part of the graph of the function "n maps to the number of patterns which occur in precisely n instances".
Posted by ug on February 27, 2007
ug wrote on March 2, 2007:
Correction: the database used for producing these numbers and the figure does not include the GoGoD games (contrary to what I first wrote), but "only" the 50.000 games from KGS.