NAHT: Not A Hash Table 
Joshua Reich 2006
spam@i2pi.com

DESIGN
------

Hash tables are pretty cool. They let you store data, without really knowing the distribution
of the data in advance. Given a good hashing function, and a large enough table, searching 
should be O(1). However, in most cases it approaches O(x/2) where x is the average length of 
the chain which dangles off the end of each hash bucket. To dynamically reduce this length,
a good implementation will keep the bucket fill rate below 70%, or so. This should ensure that
x is close to 1. However, to achieve this, hash table libraries need to periodically re-hash
their contents. This can be slow. Additionally, chains are usually maintained as linked lists.
This adds both space and computation overhead when traversing a chain. Not to mention the
problems they cause for CPU cache hit rates. 

Not A Hash Table is not a hash table implementation, but is very similar. It has some advantages,
and drawbacks. Your milage may vary. 

The key difference in this data structure is that we use a 2D array to store our data. It is 
fair to think of each row in this array as a hash bucket, with the chains extending along each
row. From this point, the operation is quite similar to a hash table. When looking for a piece
of data, you first hash the contents. NAHT does not provide hashing functions, but I personally
use Bob Jenkin's page as a reference (http://burtleburtle.net/bob/hash/doobs.html).

Once the hash is calculated, naht converts this unsigned long to a 'bucket' number. This is done
by bitwise ANDing against one minus the total number of buckets. This total number of buckets
is always of the form 2^b. This avoids an expensive MOD operation. 

Now that we have located our 'bucket', we search for our data item, starting with the item that
was most recently added. We do this as typical access patterns will often search for recently
added items. 

To insert, we have 2 routes. Firstly, if we have yet to use the entire space allocated for the 
chain, we simply add the item to the end of the chain. However, if we have already used all the
space available (the width of the 2D array), we must replace some item with the new one that we
wish to add. In this case, we over-write the least recently added item. This means that in such
cases, if were then to do a search for that replaced item, it would not be found, even though it
was previously stored. 

If you can't afford to lose items that were previously stored, the discard() function is provided.
This function is called whenever we are about to replace an item with a new item, with the to-be-
replaced item as a parameter. You can then use this callback to store the item elsewhere, perhaps
in a database.

This is the major downside to this algorithm. To get optimal performance, you need to understand
the trade off between naht array size (as specified in the mb parameter to naht_init), and the
cost of offline storage.

COMPILATION
-----------

The Makefile provided is purely for demonstration, running 'make' will build some test code. 
To use naht in your code, build a library, or simply copy naht.c and naht.h to your source
directory. 

The only thing to note is for thread saftey, you should add the -DTHREAD_SAFE flag.

If you are not using naht in a multi-threaded environment, its best to leave this flag out.

naht has been tested on:
- Mac OS X (PPC)
- Linux (AMD 64)
- Solaris (SPARC)

USAGE
-----

Read test.c for an example.

OBTAINING
---------
http://i2pi.com/~josh/naht/
