THIS IS THE second of two columns covering updates and additional information about standard library
containers. In my previous column, I presented news updates about vector
and deque. This time, the focus is on more news about the associative
containers set, multiset, map, and
multimap.*
ASSOCIATIVE CONTAINERS Here are some key points about what it means to iterate over an associative
container, and about what associative containers' iterators do and are.
Each of the standard asociative containers internally maintains its elements in a way that allows fast
traversal in nondescending key order and fast retrieval by key, where the keys are always compared
according to the provided Compare type (this defaults to
less<Key>, which uses the
operator<() provided for Key
objects). Whenever a new key is inserted into a container, the container finds a proper place to insert the
new key so that it maintains the proper ordering of its internal data structure.
Iterators are easy to implement because they are supposed to traverse the entries in nondescending
Key order, and the container already internally stores the elements
in a way that naturally supports that ordering. Pretty basic, right?
THE KEY REQUIREMENT An essential requirement, without which associative containers could not work
reliably at all, is this: Once a key has been inserted into the container, that key had better not be changed
in any way that would change its relative position in the container. If that ever did happen, the container
wouldn't know about it and its assumptions about the ordering of its entries would be violated, searches for
valid entries could fail, iterators would no longer be guaranteed to traverse the contents in key order, and
bad things would happen in general.
A MOTIVATING EXAMPLE For example, consider a map<int,string>
named m that has the contents shown in Figure 1; each node within
m is shown as a pair<int,string>.
I'm showing the internal structure as a binary tree because this is what all current standard library
implementations actually use.
Figure 1. Sample internal contents of a map<int, string>.
As the keys are inserted, the tree's structure is maintained and balanced in such a way that a normal inorder
traversal visits the keys in the usual less<int> ordering. So far,
so good.
But now say that, through an iterator, we could arbitrarily change the second entry's key, using code that
looked something like the following:
// Example 1(a): Wrong way to change a key
// in a map<int,string> m.
map<int,string>::iterator i = m.find( 13 );
if( i != m.end() )
{
const_cast<int& >( i->first ) = 9999999;
// oops!
}
Note that we have to cast away const to get this code to compile. The
problem here is that the given code is interfering with the map's internal representation by changing the
map's internals in a way that the map
isn't expecting and can't deal with (see Fig. 2).
Example 1(a) corrupts the map's internal structure. Now, for example, an
iterator traversal will not return the contents of the map in key order,
as it should. For example, a search for key 144 will probably fail, even though the key exists in the
map. In general, the container is no longer in a consistent or usable
state. Note that it is not feasible to require the map to defend itself against such illicit usage
automatically, because it can't even detect this kind of change. In Example 1(a), the change was made
through a reference into the container, without calling any map member
functions.
Note that, if the Example 1(a) code had changed the key in such a way that the relative ordering remained
unchanged, there would have been no problem. The problem is only with code that attempts to change the relative
ordering of keys once they are in the container.
Ideally, we would like to prevent this through a suitable coding standard. What, then, should such a coding
standard say?
Option #1: Say "const Means const!"
(Insufficient) In Example 1(a), we needed to cast away const to change
the key. This is because Standard C++ actively tries to prevent code that changes the relative ordering of keys.
Standard C++ enforces a seemingly stricter requirement, namely that, for maps
and multimaps, keys should not be modified at all. A
map<Key, Value>::iterator points to a
pair<const Key, Value> that, apparently, lets you modify the value
part, but not the key part, of the pointed-at map entry. This, again
apparently, prevents a key from changing at all, much less changing in a way that would alter its position
in the map object's internal ordering.
One option, then, is to print large posters that say "const means
const!" and hold rallies with celebrities and incite mass media coverage
to increase awareness of this deplorable problem. This would prevent code like that in Example 1(a), which
doesn't work in practice anyway, wouldn't it?
It's a good idea, but unfortunately telling people to respect const isn't
enough. For example, if the Key type has a
mutable member that affects the way
Compare compares Key objects,
then calling const member functions on a
const key can still change the relative ordering of keys.
Option #2: Say "Always Change Keys Using Remove-Then-Insert" A better, but still insufficient,
solution is to follow the discipline: "To change a key, remove it and reinsert it." For example:
// Example 1(b): Better way to change a key
// in a map<int,string> m.
//
map<int,string>::iterator i = m.find( 13 );
if( i != m.end() )
{
string s = i->second;
m.remove( i );
m.insert( make_pair( 9999999, s ) ); // OK
}
This is better because it avoids any change to keys, even keys with mutable members that are significant in
the ordering. It even works with our specific example. So this must be the solution, right?
Figure 2. Figure 1 after Example 1(a) executes... Problem!
Unfortunately, it's still not enough in the general case, because keys can still be changed while they are
in the container. "What?" one might ask. "How can keys be changed while they're in the container, if we adopt
the discipline of never changing key objects directly?" Here are two counterexamples:
- If the Key type has state that is visible to other code
(for example, if Key contains a pointer to a shared buffer that can
be modified by other parts of the system without going through the Key object),
and that affects the way Compare compares
Key objects, then changing the visible state can still change the
relative ordering of keys. Note that this change can happen without the knowledge of the code that uses
the associative container, and therefore without a remove-then-reinsert operation.
- Consider a Key type of string and a
Compare type that interprets the key as a file name and compares the
contents of the files. It's obvious that even if the keys are never changed, the relative ordering of keys
can still change if the files are modified by another process, or (if the file is shared on a network) even
by a user on a different machine on the other side of the world.
WHAT ABOUT set? This brings us to
set, because since we haven't found a good answer yet we might as well
look to see what set does about it. Now, because a
set can be thought of as just a map<Key,
Value> without a Value type, you might expect that
set would do much the same thing with its key type as
map does, and that a
set<Key>::iterator and a
set<Key>::const_iterator would be the same thingthat is, that
they would both point to a const Key. This would prevent a programmer
from modifying the pointed-at key.
This is not the case, however, and set::iterator was deliberately designed
to point at a non-const key because the contained value might contain other
information that does not affect comparison.
The standard treats set and map keys
differently, and seemingly contradictorily, because sets and
maps are intended to be used differently. For example, say that you
want to implement a lookup that, given a customer's name, finds the customer's address or phone number.
You might implement a map<string, CustData> where the key
is the customer's name, and the value is a CustData structure that
contains address and telephone information.
But what if you already have a perfectly good Customer class that
contains the customer's name and information? Then it might seem a waste of programmer time to add the
complexity of yet another CustData structure that contains all of
the same information except for one name string. On the other hand, it would be wasteful of runtime
resources to implement a map<string, Customer> where the key
is a redundant copy of the customer's name already being stored inside the
Customer objectwe really shouldn't have to store that name twice.
Instead, you would naturally think of implementing a set<Customer>
that needs no new data structure and incurs no redundant string overhead. And both a
map<string, Customer> and a
set<Customer> will let you change the
Customer object once it's in the container. However, the
set<Customer> method requires that you not change
a Customer object in such a way that would change its relative
ordering in the container.
So, noting that our first two attempts to write a more specific rule didn't cover all the cases, and
noting that even set and map
handle things differently, perhaps we can do no better than a general rule.
THE KEY REQUIREMENT (REPRISE) So what rule should we follow to ensure that we're using associative
containers correctly? Although it would be nice to codify more specific discipline,
const alone won't do it (and set
and map have inconsistent "key constness"
policies), and simple coding rules alone won't do it. So we can't be much more specific than to simply state
the requirement and add: "You have to know what you're doing."
The Associative Container "Key Rule":
Once a key has been inserted into an associative container, that key must never change its relative position
in the container.
|
Be aware of the direct and indirect ways in which a key might change its relative order, and avoid those
practices. You'll also be avoiding problems with the standard associative containers.
IN THE MAIL In my previous column,1 I recommended that you consider
using deque by default instead of
vector, on the grounds that deque is
easier to use because it's naturally more efficient for growth, and at the same time
deque isn't much slower across the board in real-world implementations
of the standard library.1 In the column2
before that, I criticized the standard vector<bool> specialization
because it forces on all users the choice of optimizing for space at the possible expense of speed, and
because it breaks the standard container requirements.
Astute reader Mark Frey wrote me about a seeming dichotomy in this advice. In part, he wrote:
"I find it a little bit hard to reconcile your complaint that this specialization
[vector<bool>] sacrifices speed for space savings on one hand,
with your recommendation that std::deque be preferred because of its
friendly memory use despite its performance penalty."
This is a good point, and others may be wondering about this, too, so let me explain why I don't think I'm
being inconsistent. I'm actually following a basic principle of optimization:
Don't optimize prematurely: Write for clarity and accuracy first, and only optimize once a profiler tells
you when and where it's needed.
From this principle we can draw both conclusions:
- The vector<bool> specialization is a classic premature
optimization in that it's forced on all users regardless of their programming environment and context,
and is rightly criticized on those grounds. (It also has a bigger problem that's not related to optimization:
It breaks container requirements and therefore can't be used reliably with generic algorithms.)
- The advice to "prefer deque" was based on the principle of writing
for clarity and accuracy first. When you know, by measuring, that the deque
is causing a performance bottleneck, it should of course be replaced with something faster. But in general
it's best to prefer habits that encourage simplicity and correctness first, and
here deque helps out because it eliminates the need to know about and
consider vector's reserve() baggage
when growing vectors.
References
- Sutter, H. "Standard Library News, Part 1: Vectors and Deques," C++ Report, 11(7), July/Aug. 1999.
- Sutter, H. "When Is a Container Not a Container?"
C++ Report, 11(5), May 1999.
* Note that set and multiset aren't
really "associative" in the usual sense of relating a lookup key with another value, like a dictionary does.
That's the job of map and multimap.
But all four of these containers appear in the C++ standard under the heading "Associative Containers," so
I'll follow suit to be consistent.