Andrew

March 7, 2007

When Bad Hashing Means Good Caching

Filed under: Hashing,Programming — floodyberry @ 1:54 pm
Tags:

I was testing various string hashing algorithms on chained hash tables, primarily to look at the bucket distribution and number of key comparisons with both prime and power of 2 sized tables. Each table node was set up to remember it’s full hash value so bucket collisions would only drop to a key comparison on a true key collision. I initially wasn’t concerned with run times, but I tacked on a timer anyway so I could get a quick metric on how collisions and distribution were affecting performance and wound up running in to a rather odd situation. For the purposes of this article, the tables were also pre-sized before inserting keys to avoid resizing overhead.

While I was testing many algorithms, only two are needed for this article: The string hash from STLPort 4.6.2 and Paul Hsieh’s SuperFastHash. STLPort’s hash is simply “hash = ( hash * 5 ) + string[ i ]” which should be fairly fast but produce weak distribution on large amounts of clustered keys. SuperFastHash should also be quite fast while having much stronger hashing no matter the key size or clustering.

The dataset that caused the problem was a simple 6 character sequential permutation in the form of:

[ABCDEFGH][EFGHIJKL][IJKLMNOP][MNOPQRST][QRSTUVWX][UVWXYZ01]

e.g.

AEIMQU
AEIMQV
AEIMQW
AEIMQX
AEIMQY
AEIMQZ
AEIMQ0
AEIMQ1
AEIMRU
AEIMRV
AEIMRW
AEIMRX
….
HLPTWZ
HLPTW0
HLPTW1
HLPTXU
HLPTXV
HLPTXW
HLPTXX
HLPTXY
HLPTXZ
HLPTX0
HLPTX1

(262144 unique keys in all)

I was hoping to ferret out the weaker algorithms with this dataset and boy did I ever. Almost all the stats were what I expected: STLPort had ~10x bucket density for both prime/pow2 table sizes (i.e. an average of 10 extra links per filled bucket), SuperFastHash had 0.77x for prime and 0.64x for pow2; STLPort had 1.5 million key compares due to full key hashes colliding, SuperFastHash had 0.006 million (6000). Now this is where it got weird: The STLPort powered hash table ran ~55% faster on my AMD64 3200+ than the SuperFastHash powered hash table. The results were similar on my AMD 1.1ghz TBird with the STLPort hash table running ~36% faster than SuperFastHash. The raw difference on both CPUs was around the same, 90ms vs 140ms on the AMD64 and 190ms vs 260ms on the Tbird. Something was definitely up.

Initial profiling was not very helpful. The STLPort hash algorithm ran faster than SuperFastHash, but the total difference was a few milliseconds at most even though SuperFastHash’s weakness is in smaller keys where it can’t take advantage of it’s streamlined innerloop. As far as the Insert method for the hash table, the 1.5 million string compares took ~17ms which gave SuperFastHash a slight advantage when checking for duplicates, but even when I stripped out the duplicate checking code (all the keys were unique so it was safe to do so) STLPort still had a healthy advantage.

With the hash functions coming out equal and the duplicate checking code taken out, there really wasn’t much left to the method at all. The offending statements came down to

node = new Node( key, value, *head, hash );
*head = node;

which simply allocates a new node with a simple constructor (key, value, next, hash) and points the head of the bucket to our new node. Since the hashtable is using <string *, int> for it’s key/value pair, this is just couple dword copies to initialize the node and update the bucket list, so how could this be slowing our code down so badly? I suspected the allocator might have something to do with it after noticing that running multiple tests in a single session resulted in increased runtime for each successive test (due to accumulated news/deletes?) for VS7 and to a lesser degree for gcc 3.4.4. I dropped in the Hoard Memory Allocator and while the code sped up a bit and multiple tests were possible in a single run with no degredation, STLPort still held the same lead (STLPort down to 55ms and SuperFastHash down to 105ms). I was about at my wits end when I decided to dump the actual bucket indices for each key to see where they were being placed:

STLPort:
273830
273831
273832
273833
273834
273835
273793
273794
273835
273836

SuperFastHash:
4267
391151
77466
272914
376435
77856
225677
489782
292888
97819
9443

Whoops. STLPort’s bucket accesses were nearly linear and all clustered around the same area (only the middle 1/10th of the buckets were being used), while SuperFastHash’s accesses were (properly) random across the entire list. STLPort’s string comparisons were also linearly clustered due to being allocated and inserted in alphabetic order:

CKKNTX – CJPNTX
CKKNTX – CJOTW0
CKKNTX – CJOSTX
CKKNTY – CKJTW1
CKKNTY – CKJSTY
CKKNTY – CJPPR1
CKKNTY – CJPOW1
CKKNTY – CJPNTY
CKKNTY – CJOTW1
CKKNTY – CJOSTY
CKKNTZ – CKJSUU
CKKNTZ – CKJSTZ
CKKNTZ – CJPNUU
CKKNTZ – CJPNTZ
CKKNTZ – CJOSUU
CKKNTZ – CJOSTZ

When the list was randomized and inserted again (under Hoard), SuperFastHash stayed at a cool 105ms while STLPort shot up to 460ms. STLPort gained ~360ms from the now non-linear string comparisons and ~40ms from the non-linear bucket accesses. Same number of compares and resulting bucket contents as with the linear dataset, but now with huge penalties for STLPort’s cache misses.

Lessons learned

  • Test results can be meaningless if you don’t understand exactly what you’re testing.
  • Unless one of the algorithms being tested is explicitly taking advantage of cache lines, it is possible to get highly bogus results if you don’t generate your test data well. Before this I hadn’t even considered that caching could have this kind of effect on chained hashtables.
  • Hashing algorithms giving the fastest possible time for a particular dataset isn’t as important as giving the least worst time unless you are certain it will absolutely not be used for anything else. A flawed algorithm can even be dangerous as illustrated by Scott Crosby and Dan Wallach’s Denial of Service via Algorithmic Complexity Attacks.

Source Code

You may download the C++ Source Code demonstrating the problem and try it out for yourself. While the STLPort execution time should balloon when switching to the randomized dataset, it won’t always be faster than SuperFastHash on the linear dataset. On a K6-3 450mhz STLPort is actually slower: 580/1500ms for the linear/random datasets versus 500/500ms for SuperFastHash.

Example output on my AMD64 3200+:

This software uses the Hoard scalable memory allocator (version 3.5.1, libhoard).
Copyright © 2005 Emery Berger, The University of Texas at Austin, and the University of Massachusetts Amherst.
For more information, see http://www.hoard.org
starting..
elapsed for STLPort-4.6.2/string-6-linear PRIME 63, comps 1581380, sdev 10.345448, active bins 27357, items 262144

elapsed for STLPort-4.6.2/string-6-linear POW2 63, comps 1581380, sdev 10.345448, active bins 27357, items 262144
elapsed for STLPort-4.6.2/string-6-random PRIME 484, comps 1581380, sdev 10.345448, active bins 27357, items 262144
elapsed for STLPort-4.6.2/string-6-random POW2 485, comps 1581380, sdev 10.345448, active bins 27357, items 262144

elapsed for Hsieh/string-6-linear PRIME 94, comps 6145, sdev 0.775544, active bins 188020, items 262144
elapsed for Hsieh/string-6-linear POW2 94, comps 6145, sdev 0.639169, active bins 202819, items 262144
elapsed for Hsieh/string-6-random PRIME 109, comps 6145, sdev 0.775544, active bins 188020, items 262144
elapsed for Hsieh/string-6-random POW2 110, comps 6145, sdev 0.639169, active bins 202819, items 262144

Advertisements

3 Comments »

  1. did you just wrote a bunch of posts in a day or my rss reader sucks?

    Comment by makc3d — October 7, 2009 @ 4:09 pm | Reply

  2. March 7, 2007 – I guess it’s latter :(

    Comment by makc3d — October 7, 2009 @ 4:09 pm | Reply

  3. Yeah, just one new post, nothing backposted or anything.

    Comment by floodyberry — October 8, 2009 @ 3:55 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: