Monday, March 30, 2015

The Data Structure Barbecue Grill

Charcoal, Fire Starter, Match

Recently, a colleague asked me a question about the implementation of a hash table. I found that I had to stop and think for a moment. My explanation was by no means the smoothest delivery. I tripped over my words a few times and needed to catch myself and backtrack before settling on the correct details. This incident has convinced that it's time for a refresher on one of the fundamental components of computing science: data structures.

Rather than dusting off my old text books and violating a copyright or two, I thought I'd take a new approach inspired by a conversation with yet another colleague of mine. With spring in full bloom here in Vancouver, it's time to pull out a stiff wire brush and begin cleaning the grill. Consider this your invitation to the first ever data structure barbecue.

On A Stick

While my usual preference is for a nice steak, there is one barbecue item that is nearly as much fun as it is delicious: the kebab. The scrumptious kebab is a collection of seasoned meats and vegetables skewered onto a single stick.

How is this tantalizing treat comparable to a data structure? When building a kebab it's impossible to add new pieces to the middle, but poking a new item onto either end is easy. Once cooked, it's far simpler to remove and eat from either end than it is to chew off the center. This matches the properties of a double-ended queue. You can add or remove at either end of the queue, but the items remain in a fixed order and cannot be accessed until found at either end of the queue.

A neat twist on our double-ended kebab is a kebab with a handle on one end. This kebab can only be accessed from a single end; the exact properties of a stack. We push items on one at a time, keeping the initial order until removed. The first item added to the kebab when constructing it is the very last to be removed and eaten. We must wait until all other pieces are gone before finally arrivinge back where we started.

Intestinal Fortitude

Our second course here at the data structure barbecue is a juicy and meaty collection of grilled sausage links. Filled with ground seasoned meats, seasonings, and aromatics, these stuffed links are bursting with flavor. The links of this chain are joined to their nearest neighbors by a thin brand, creating a long collection of heartburn-inducing, artery-clogging satisfaction.

Most of you have probably already figured this one out. This closely resembles the good old-fashioned linked list. Starting at one end, we can journey from link to link all the way to the other end. The order is fixed, but severing just one connection breaks the list into two. Losing a handle on the second half causes the whole chain to be dropped on the floor causing a mess and making it unusable (a memory leak).

Fresh From Chilliwack

I made a special trip down the Fraser Valley for our final course here at the data structure barbecue. Today's finale is grilled corn on the cob. I hope you held onto the toothpick-like skewer from the kebabs, because you're about to sink your teeth into some fresh, sweet corn. This final starchy treat contains a fixed collection of ears of corn. You can start wherever you'd like and easily access any area on demand, choosing to stay in one area or revisit places you've already covered.

The cob of corn closely resembles the classic array. It's very rigid in total size and arrangement of values, but easy to access any part at any time. The constraints of an array are limiting, but enable incredibly quick access to any item within the data structure.

Where's The Hash Table?

Over there next to the table with the macaroni salad, the snacks and all the fixings...

I guess not all things are fit to be grilled. I'll cover hash tables here because they are the essential building block for more advanced data structures such as associative arrays.

A hash table is a clever way to treat non-ordinal values (e.g. strings) as indices. The first thing to do is to define a simple to compute hash function taking the key values as input and computing a numeric hash value. A very terrible hash function could simply return a fixed value (this would result in a data structure equivalent to a linked list in most implementations). An ideal hash function given any random input would output any hash value with an equal chance. Similar input values would result in very different hashes.

Each computed hash value corresponds to a bucket containing all previously stored values. Each bucket may contain many results with the same hash value. Though there are a variety of implementations, from here the collection should be small and much more manageable. Traversing the collection, we look for a matching index and the associated value. In this way, hash tables allow us to quickly store and access a wide variety of data types using limited memory and direct object comparisons.

Come Again Soon

I hope you all enjoyed the first ever data structure barbecue grill. Please feel free to take home seconds and share with your friends. I hope the analogies weren't too labored. Please don't try to push them too far; I'm sure they'll break.

What are your favorite clever data structure analogies? Let me know in the comments.


Joshua Ganes