Standard Library News: Sets and Maps

. Herb Sutter is Chief Technology Officer at PeerDirect and the principal architect of its heterogeneous database replication product. He is also PeerDirect's representative at the ISO/ANSI C++ and SQL committees, and the author of the book Exceptional C++ and the forthcoming Peer-Distributed Databases and C++ Guru of the Week (Addison-Wesley Longman). He can be contacted at [email protected].


THIS IS THE second of two columns covering updates and additional information about standard library containers. In my previous column, I presented news updates about vector and deque. This time, the focus is on more news about the associative containers set, multiset, map, and multimap.*

ASSOCIATIVE CONTAINERS Here are some key points about what it means to iterate over an associative container, and about what associative containers' iterators do and are.

Each of the standard asociative containers internally maintains its elements in a way that allows fast traversal in nondescending key order and fast retrieval by key, where the keys are always compared according to the provided Compare type (this defaults to less<Key>, which uses the operator<() provided for Key objects). Whenever a new key is inserted into a container, the container finds a proper place to insert the new key so that it maintains the proper ordering of its internal data structure.

Iterators are easy to implement because they are supposed to traverse the entries in nondescending Key order, and the container already internally stores the elements in a way that naturally supports that ordering. Pretty basic, right?

THE KEY REQUIREMENT An essential requirement, without which associative containers could not work reliably at all, is this: Once a key has been inserted into the container, that key had better not be changed in any way that would change its relative position in the container. If that ever did happen, the container wouldn't know about it and its assumptions about the ordering of its entries would be violated, searches for valid entries could fail, iterators would no longer be guaranteed to traverse the contents in key order, and bad things would happen in general.

A MOTIVATING EXAMPLE For example, consider a map<int,string> named m that has the contents shown in Figure 1; each node within m is shown as a pair<int,string>. I'm showing the internal structure as a binary tree because this is what all current standard library implementations actually use.

figure 1

Figure 1. Sample internal contents of a map<int, string>.

As the keys are inserted, the tree's structure is maintained and balanced in such a way that a normal inorder traversal visits the keys in the usual less<int> ordering. So far, so good.

But now say that, through an iterator, we could arbitrarily change the second entry's key, using code that looked something like the following:

// Example 1(a): Wrong way to change a key
// in a map<int,string> m.
map<int,string>::iterator i = m.find( 13 );
if( i != m.end() )
{
    const_cast<int& >( i->first ) = 9999999;
 // oops!
}
Note that we have to cast away const to get this code to compile. The problem here is that the given code is interfering with the map's internal representation by changing the map's internals in a way that the map isn't expecting and can't deal with (see Fig. 2).

Example 1(a) corrupts the map's internal structure. Now, for example, an iterator traversal will not return the contents of the map in key order, as it should. For example, a search for key 144 will probably fail, even though the key exists in the map. In general, the container is no longer in a consistent or usable state. Note that it is not feasible to require the map to defend itself against such illicit usage automatically, because it can't even detect this kind of change. In Example 1(a), the change was made through a reference into the container, without calling any map member functions.

Note that, if the Example 1(a) code had changed the key in such a way that the relative ordering remained unchanged, there would have been no problem. The problem is only with code that attempts to change the relative ordering of keys once they are in the container.

Ideally, we would like to prevent this through a suitable coding standard. What, then, should such a coding standard say?

Option #1: Say "const Means const!" (Insufficient) In Example 1(a), we needed to cast away const to change the key. This is because Standard C++ actively tries to prevent code that changes the relative ordering of keys. Standard C++ enforces a seemingly stricter requirement, namely that, for maps and multimaps, keys should not be modified at all. A map<Key, Value>::iterator points to a pair<const Key, Value> that, apparently, lets you modify the value part, but not the key part, of the pointed-at map entry. This, again apparently, prevents a key from changing at all, much less changing in a way that would alter its position in the map object's internal ordering.

One option, then, is to print large posters that say "const means const!" and hold rallies with celebrities and incite mass media coverage to increase awareness of this deplorable problem. This would prevent code like that in Example 1(a), which doesn't work in practice anyway, wouldn't it?

It's a good idea, but unfortunately telling people to respect const isn't enough. For example, if the Key type has a mutable member that affects the way Compare compares Key objects, then calling const member functions on a const key can still change the relative ordering of keys.

Option #2: Say "Always Change Keys Using Remove-Then-Insert" A better, but still insufficient, solution is to follow the discipline: "To change a key, remove it and reinsert it." For example:

// Example 1(b): Better way to change a key
//  in a map<int,string> m.
//
map<int,string>::iterator i = m.find( 13 );
if( i != m.end() )
{
 string s = i->second;
 m.remove( i );
 m.insert( make_pair( 9999999, s ) ); // OK
}
This is better because it avoids any change to keys, even keys with mutable members that are significant in the ordering. It even works with our specific example. So this must be the solution, right?

figure 2

Figure 2. Figure 1 after Example 1(a) executes... Problem!

Unfortunately, it's still not enough in the general case, because keys can still be changed while they are in the container. "What?" one might ask. "How can keys be changed while they're in the container, if we adopt the discipline of never changing key objects directly?" Here are two counterexamples:

  • If the Key type has state that is visible to other code (for example, if Key contains a pointer to a shared buffer that can be modified by other parts of the system without going through the Key object), and that affects the way Compare compares Key objects, then changing the visible state can still change the relative ordering of keys. Note that this change can happen without the knowledge of the code that uses the associative container, and therefore without a remove-then-reinsert operation.

  • Consider a Key type of string and a Compare type that interprets the key as a file name and compares the contents of the files. It's obvious that even if the keys are never changed, the relative ordering of keys can still change if the files are modified by another process, or (if the file is shared on a network) even by a user on a different machine on the other side of the world.
WHAT ABOUT set? This brings us to set, because since we haven't found a good answer yet we might as well look to see what set does about it. Now, because a set can be thought of as just a map<Key, Value> without a Value type, you might expect that set would do much the same thing with its key type as map does, and that a set<Key>::iterator and a set<Key>::const_iterator would be the same thing—that is, that they would both point to a const Key. This would prevent a programmer from modifying the pointed-at key.

This is not the case, however, and set::iterator was deliberately designed to point at a non-const key because the contained value might contain other information that does not affect comparison.

The standard treats set and map keys differently, and seemingly contradictorily, because sets and maps are intended to be used differently. For example, say that you want to implement a lookup that, given a customer's name, finds the customer's address or phone number. You might implement a map<string, CustData> where the key is the customer's name, and the value is a CustData structure that contains address and telephone information.

But what if you already have a perfectly good Customer class that contains the customer's name and information? Then it might seem a waste of programmer time to add the complexity of yet another CustData structure that contains all of the same information except for one name string. On the other hand, it would be wasteful of runtime resources to implement a map<string, Customer> where the key is a redundant copy of the customer's name already being stored inside the Customer object—we really shouldn't have to store that name twice. Instead, you would naturally think of implementing a set<Customer> that needs no new data structure and incurs no redundant string overhead. And both a map<string, Customer> and a set<Customer> will let you change the Customer object once it's in the container. However, the set<Customer> method requires that you not change a Customer object in such a way that would change its relative ordering in the container.

So, noting that our first two attempts to write a more specific rule didn't cover all the cases, and noting that even set and map handle things differently, perhaps we can do no better than a general rule.

THE KEY REQUIREMENT (REPRISE) So what rule should we follow to ensure that we're using associative containers correctly? Although it would be nice to codify more specific discipline, const alone won't do it (and set and map have inconsistent "key constness" policies), and simple coding rules alone won't do it. So we can't be much more specific than to simply state the requirement and add: "You have to know what you're doing."

The Associative Container "Key Rule":
Once a key has been inserted into an associative container, that key must never change its relative position in the container.

Be aware of the direct and indirect ways in which a key might change its relative order, and avoid those practices. You'll also be avoiding problems with the standard associative containers.

IN THE MAIL In my previous column,1 I recommended that you consider using deque by default instead of vector, on the grounds that deque is easier to use because it's naturally more efficient for growth, and at the same time deque isn't much slower across the board in real-world implementations of the standard library.1 In the column2 before that, I criticized the standard vector<bool> specialization because it forces on all users the choice of optimizing for space at the possible expense of speed, and because it breaks the standard container requirements.

Astute reader Mark Frey wrote me about a seeming dichotomy in this advice. In part, he wrote:

"I find it a little bit hard to reconcile your complaint that this specialization [vector<bool>] sacrifices speed for space savings on one hand, with your recommendation that std::deque be preferred because of its friendly memory use despite its performance penalty."

This is a good point, and others may be wondering about this, too, so let me explain why I don't think I'm being inconsistent. I'm actually following a basic principle of optimization:

Don't optimize prematurely: Write for clarity and accuracy first, and only optimize once a profiler tells you when and where it's needed.

From this principle we can draw both conclusions:

  1. The vector<bool> specialization is a classic premature optimization in that it's forced on all users regardless of their programming environment and context, and is rightly criticized on those grounds. (It also has a bigger problem that's not related to optimization: It breaks container requirements and therefore can't be used reliably with generic algorithms.)

  2. The advice to "prefer deque" was based on the principle of writing for clarity and accuracy first. When you know, by measuring, that the deque is causing a performance bottleneck, it should of course be replaced with something faster. But in general it's best to prefer habits that encourage simplicity and correctness first, and here deque helps out because it eliminates the need to know about and consider vector's reserve() baggage when growing vectors.
References
  1. Sutter, H. "Standard Library News, Part 1: Vectors and Deques," C++ Report, 11(7), July/Aug. 1999.
  2. Sutter, H. "When Is a Container Not a Container?" C++ Report, 11(5), May 1999.
* Note that set and multiset aren't really "associative" in the usual sense of relating a lookup key with another value, like a dictionary does. That's the job of map and multimap. But all four of these containers appear in the C++ standard under the heading "Associative Containers," so I'll follow suit to be consistent.