In this chapter, we will see one of Perl's features that makes Perl one of the world's truly great programming languages -- hashes. Although hashes are a powerful and useful feature, you may have used other powerful languages for years without ever hearing of hashes. But you'll use hashes in nearly every Perl program you'll write from now on; they're that important.
In the olden days, we called these "associative arrays." But the Perl community decided in about 1995 this was too many letters to type and too many syllables to say, so we changed the name to "hashes."
A hash is a data structure, not unlike an array in that it can hold any number of values and retrieve them at will. But instead of indexing the values by number, as we did with arrays, we'll look up the values by name. That is, the indices (here, we'll call them keys ) aren't numbers, but instead they are arbitrary unique strings (see Figure 5-1).
The keys are strings, first of all, so instead of getting element number 3 from an array, we'll be accessing the hash element named wilma.
These keys are arbitrary strings -- you can use any string expression for a hash key. And they are unique strings -- just as there's only one array element numbered 3, there's only one hash element named wilma.
Another way to think of a hash is that it's like a barrel of data, where each piece of data has a tag attached. You can reach into the barrel and pull out any tag and see what piece of data is attached. But there's no "first" item in the barrel; it's just a jumble. In an array, we'd start with element 0, then element 1, then element 2, and so on. But in a hash, there's no fixed order, no first element. It's just a collection of key-value pairs.
The keys and values are both arbitrary scalars, but the keys are always converted to strings. So, if you used the numeric expression 50/20 as the key, it would be turned into the three-character string "2.5", which is one of the keys shown in the diagram above.
That's a numeric expression, not the five-character string "50/20". If we used that five-character string as a hash key, it would stay the same five-character string, of course.
As usual, Perl's no-unnecessary-limits philosophy applies: a hash may be of any size, from an empty hash with zero key-value pairs, up to whatever fills up your memory.
Some implementations of hashes (such as in the original awk language, from where Larry borrowed the idea) slow down as the hashes get larger and larger. This is not the case in Perl -- it has a good, efficient, scalable algorithm. So, if a hash has only three key-value pairs, it's very quick to "reach into the barrel" and pull out any one of those. If the hash has three million key-value pairs, it should be just about as quick to pull out any one of those. A big hash is nothing to fear.
Technically, Perl rebuilds the hash table as needed for larger hashes. In fact, the term "hashes" comes from the fact that a hash table is used for implementing them.
It's worth mentioning again that the keys are always unique, although the values may be duplicated. The values of a hash may be all numbers, all strings, undef values, or a mixture. But the keys are all arbitrary, unique strings.
Or, in fact, any scalar values, including other scalar types than the ones we'll see in this book.
When you first hear about hashes, especially if you've lived a long and productive life as a programmer using other languages that don't have hashes, you may wonder why anyone would want one of these strange beasts. Well, the general idea is that you'll have one set of data "related to" another set of data. For example, here are some hashes you might find in typical applications of Perl:
So, yet another way to think of a hash is as a very simple database, in which just one piece of data may be filed under each key. In fact, if your task description includes phrases like "finding duplicates," "unique," "cross-reference," or "lookup table," it's likely that a hash will be useful in the implementation.
Copyright © 2002 O'Reilly & Associates. All rights reserved.