Rethinking How to Teach C++Part 7: Payback time

Our last six articles have described in detail the teaching philosophy behind our book Accelerated C++: Practical Programming by Example:1
  • Explain how to use language or library facilities before explaining how they work.
  • Motivate each facility with a problem that uses that facility as an integral part of its solution. The problem must be interesting in its own right, and not merely constructed to demonstrate the facility.
  • Present the most useful ideas first.

From a teaching perspective, the first parts of the book are the most interesting. There is a good deal of material that any book that purports to teach C++ programming must eventually cover, so whatever a book does not cover at the beginning it must cover later. The interest is in what parts the book describes first, and how useful those parts are to the programmer. We have described approximately one-third of the book. It is probably near this point that the book has departed furthest from conventional teaching strategies. We can see just how far this departure is by summarizing the important ideas we have covered by this point in the book and those we haven't yet covered:

  • We have covered the string type from the standard library, but not the char* type that is familiar to most C and C++ programmers.
  • We have covered the vector and list types, but not built-in arrays.
  • We have covered iterators and iterator operations, but not pointers.
  • We have covered several library algorithms, such as copy and find, but not the more primitive library functions such as strcpy and strchr.

In order to show how useful the subset of C++ is that we have described so far, we will solve one of the more interesting exercises from the book.1 It is number 5-1 and appears on page 99:

Design and implement a program to produce a permuted index. A permuted index is one in which each phrase is indexed by every word in the phrase. So, given the following input,

           The quick brown fox
           jumped over the fence
the output would be

            The quick    brown fox
    jumped over the    fence
    The quick brown    fox
                            jumped over the fence
                jumped    over the fence
                    The    quick brown fox
          jumped over   the fence
                            The quick brown fox

A good algorithm is suggested in The AWK Programming Language.3 That solution divides the problem into three steps:

  1. Read each line of the input and generate a set of rotations of that line. Each rotation puts the next word of the input in the first position and rotates the previous first word to the end of the phrase. So the output of this phase for the first line of our input would be
    
                 The quick brown fox
                 quick brown fox The
                 brown fox The quick
                 fox The quick brown
    
    Of course, it will be important to know where the original phrase ends and where the rotated beginning begins.
  2. Sort the rotations.
  3. Unrotate and write the permuted index, which involves finding the separator, putting the phrase back together, and writing it properly formatted.

The rest of this article will show a straightforward solution to this problem that uses only the subset of the language that the book has discussed up to this point. You are invited to try to solve the problem yourself before reading further.

THE DATA STRUCTURE
Let's look again at what has to happen in this program. For each line of our input, such as


     The quick brown fox
we shall have to create several phrases, one beginning with each word in the line:

     The quick brown fox
     quick brown fox The
     brown fox The quick
     fox The quick brown
When we get around to writing these phrases, each one will be split into a left-hand and a right-hand side, with the first part of the original sentence on the left and the second part on the right. We will keep track of this split by storing an index that represents the location of the first word on the left-hand side. The right-hand side contains the word on which the index is based, so it is never empty. Therefore, the indices associated with these lines are:

     The quick brown fox       4
     quick brown fox The       3
     brown fox The quick       2
     fox The quick brown       1
We will then sort these phrases, along with their indices, to obtain

     brown fox The quick       2
     fox The quick brown       1
     quick brown fox The       3
     The quick brown fox       4
and then use each phrase and its corresponding index to reconstruct the original input, lined up appropriately:

             The quick    brown fox
     The quick brown    fox
                    The     quick brown fox
                             The quick brown fox
From this output, you can see why the last line's index is 4 rather than 0; if it were 0, the entire sentence would appear before the gap.

We shall use a straightforward representation for this data structure:


     typedef vector Phrase;

     struct Rotated_phrase {
          Phrase words;
          Phrase::size_type bound;
     };
We are using the name Phrase as an abbreviation for vector because it occurs so many times. A Rotated_phrase structure will represent a single line of output. We shall write that line by writing the elements of words, beginning at the element with index bound and continuing until we reach the end of words. Then we shall write a few spaces and continue with the initial element of words. We'll stop just before we reach the element with index bound, at which point we'll have written all of the elements of words.

This representation may appear odd. In effect, we are putting the beginning of each output line at the end of the words member, and the end of each output line at the beginning. However, by doing so, we greatly simplify the work of sorting the output, because we can simply sort all the vectors in dictionary order.

In other words, when it comes time to sort the array of Rotated_phrase objects, we shall do so by using the following comparison function:


     bool compare(const Rotated_phrase& x,
                      const Rotated_phrase& y)
     {
          return x.words < y.words;="" }="">
In doing so, we take advantage of the fact that all of the standard container classes define the relational operators in terms of the corresponding relational operators on their elements.

TWO AUXILIARY FUNCTIONS
Now that we have decided on our data structure, we are nearly in a position to write the program. Before we do so, however, we shall pause to write two auxiliary functions that will prove useful.

The first function writes a sequence of words with spaces between them. This function is not terribly complicated, but it is not completely trivial either because it needs to avoid putting stray spaces before the first word or after the last one:


void write_phrase(ostream& out,
                 Phrase::const_iterator begin,
                 Phrase::const_iterator end)
     {
          if (begin != end) {
               out < *begin;="" while="" (++begin="" !="end)" out="">< "="" "="">< *begin;="" }="" }="">
We begin by checking for an empty range, in which case we do nothing at all. If the range is not empty, we write the first element. Then, for each element after the first, we write a space and then the element. This strategy ensures that the only spaces we write will separate the words in the phrase.

The second function determines how much space a phrase will occupy in the output, including the spaces between the elements, without actually writing any output. We shall need this function later in order to arrange for the output to line up properly:


      string::size_type phrase_length
           (Phrase::const_iterator begin,
                Phrase::const_iterator end)
      {
          string::size_type sum = 0;
          if (begin != end) {
               sum = begin->size();
               while (++begin != end)
                    sum += begin->size() + 1;
          }
          return sum;
     }
This function is similar in structure to write_phrase. If the range is empty, the width is zero. Otherwise, we compute the length of the initial word; then for each word beyond the first, we add the length of that word and one more for the space between the word and its predecessor.

THE MAIN PROGRAM
We are now ready to solve the problem. We begin by declaring a variable that we will use to hold all of the rotated phrases:


      vector phrases;
As we create the rotated phrases, we will also need to keep track of the length of the longest left-hand part of any phrase. We need this information in order to determine how many spaces to write at the beginning of output lines. For example, if the output is

             The quick    brown fox
     The quick brown    fox
                    The    quick brown fox
                            The quick brown fox
the longest left-hand part is The quick brown; any phrases with a shorter left-hand part will have to be padded to the width of that longest phrase.

It should be clear that the width of the longest left-hand side will be the width of one of the input sentences without its last word. Therefore, we will use our phrase_length function to determine the appropriate width for each line as we read it. We must define a variable to store the maximum length itself:


      string::size_type max_size = 0;
At this point, we can read the input. We will define a variable to hold each line as we read it, and then write a loop to do the actual reading. The loop will begin by taking the line and splitting it into words. We will define a Rotated_phrase object and use its words member to store those words:

string line;
while (getline(cin, line)) {
     phrase.words = split(line);
The split function here is the one that we defined in Part 5 of this series. It takes a string as its argument and returns a vector; that is, a Phrase in which each element is a word of the string. Our next job is to check whether the line actually had any words on it. If so, we compute the length of the phrase without its last word, and keep track of the greatest such length:

     if (phrase.words.size()) {
           max_size = max(max_size,
                    phrase_length(phrase.words.begin(),
                    phrase.words.begin() + 
                                 phrase.words.size() -1));
If phrase.words.size() is zero, it is meaningless to talk about the phrase without its last word because the phrase has no last word. For that matter, if phrase.words.size() is zero, the phrase will contribute no entries to phrases. Therefore, we skip the entire following computation if phrase.words.size() is zero.

Once we have determined that there is at least one word in our phrase, we need to cycle phrase.words through all of its rotations and give an appropriate value to phrase.bound. Those rotations, and their corresponding values, will look like this:


     words                       bound

     fox The quick brown         1
     brown fox The quick         2
     quick brown fox The         3
     The quick brown fox         4
For each rotation, the value of bound is the element of words that begins the left-hand side of the output. So, for example, the first rotation will have The quick brown as its left-hand side, and fox as its right-hand side. Recall that the last rotation has a bound of 4, rather than 0, because the entire phrase is on the right-hand side of the output, not on the left-hand side.

There are many ways of achieving this kind of rotation. The following one is particularly succinct:


             // Generate every possible rotation of phrase.words,
             // and associate each rotation with a different integer,
             // starting from 1
             // Each time through this loop, we first rotate
             // phrase.words one position to the right, and
             //then push it and its corresponding integer
             // onto the back of  phrases.
      for (phrase.bound = 1;
           phrase.bound <= phrase.words.size();="" ++phrase.bound)="" {="">// remove the last word from phrase.words
           string last_word = phrase.words.back();    
           phrase.words.pop_back();

      // insert the word at the beginning of phrase.words
           phrase.words.insert
                   (phrase.words.begin(), last_word);
           phrases.push_back(phrase);
        }
     }
}
We copy the last element of phrase.words into a local variable named last_word, and delete that word from phrase.words. Next, we insert at the beginning of phrase.words the word that was formerly at the end. The for loop has given phrase.bound an appropriate value, and we have given phrase.words an appropriate value, so we use push_back to append a copy of phrase to our phrases variable.

At this point, we have read all the input, and phrases contains all of the rotated phrases that correspond to all of the input. Moreover, we have already defined a compare function that will determine the relative order of two phrases. Therefore, the second part of our problem—sorting the rotated phrases—is simple:


  sort(phrases.begin(), phrases.end(), compare);
What remains is to write the sorted phrases. To do so, we will write a for loop that visits every rotated phrase:

     for (vector::const_iterator 
          i = phrases.begin();
          i != phrases.end(); ++i) {
For each phrase, we will define an iterator that identifies the starting point of the left-hand side:

     Phrase::const_iterator bound = 
        i->words.begin() + i->bound;
Remember that the left-hand side of the output starts at the element with index bound and ends at the end of the phrase; the right-hand side starts at the beginning of the phrase and ends immediately before the element with index bound.

We must now write a number of spaces that is sufficient to pad the left-hand side to a common width. To compute the appropriate number, we will compute the length of the left-hand side and subtract it from max_size, which we calculated earlier. Then we will construct a string with that many spaces and write it:


     cout < string(max_size="" -="" phrase_length(bound,="" i-="">words.end()),
                         ' ');
All that is left to do is to write the left-hand side, some spaces, and the right-hand side, and end the line:

          write_phrase(cout, bound, i->words.end());
          cout < "="" ";="" write_phrase(cout,="" i-="">words.begin(), bound);
          cout < endl;="" }="">
DISCUSSION
With three exceptions, this program uses only language and library facilities we discuss in the first 100 pages of Accelerated C++. The three exceptions are the back and pop_back member functions and the use of < to="" compare="" entire="">vectors.

It is trivial to get along without using back: Simply replace phrase.words.back() with phrase.words[phrase.words.size() - 1]. Similarly, we can get along without pop_back by using the erase function, which we discuss in the first 100 pages of the book.

It is harder to get by if you don't realize that < works="" on="" entire="">vectors. However, it is still not difficult. All it means is that the compare function, instead of having a one-line body, will have a body of a dozen lines or so.

However, if we were using this example in the classroom, we would keep it as it is, because it does something we think is important from a pedagogical viewpoint: It expands the students' universe a little, and does so in a way that falls naturally out of the example. Indeed, students who have tried to solve this problem by themselves, and then see the simplifications that are possible because of facilities, such as back and vector comparison—that are just a little beyond what they have seen so far—will remember those facilities the next time they encounter a problem that can use them.

Of course, this program can stand a good deal of improvement. In particular, it stores n copies of every n-word phrase, one for each rotation. We can do much better than that: Instead of storing each rotation separately, we can define one data structure that we will use to store each line of the input, and a second data structure that we can use to note where in the first data structure to find the information needed to reconstruct each rotation.

The usual way to implement data structures that refer to elements of other data structures is by using pointers, and we haven't talked about these yet. Indeed, Accelerated C++ won't talk about pointers for another 68 pages. Nevertheless, we have enough intellectual ammunition to solve this space problem without pointers, using data structures that refer to other data structures, and in a way we think is every bit as compact and efficient as it would be if we were to use pointers.

The revised solution will be the subject of our next article.

REFERENCES

  1. Koenig, Andrew and Barbara Moo. Accelerated C++: Practical Programming by Example, Addison–Wesley, Reading, MA, 2000.
  2. Koenig, Andrew and Barbara Moo. "Analysis of a Classroom Exercise, Part 5: Working with Strings," Journal of Object-Oriented Programming, 13(11): March 2001.
  3. Aho, Alfred V., Brian Kernighan, Peter J. Weinberger. The AWK Programming Language, Addison–Wesley, 1988.

About the Authors

Andrew Koenig is a Principal Technical Staff Member at AT&T Research and Project Editor of the ISO/ANSI C++ standards committee. He is co-author with Barbara Moo of the books Ruminations on C++ and Accelerated C++: Practical Programming by Example.

Barbara Moo retired from AT&T in June 1998. While at AT&T she worked on the first commercial product written in C++. She is co-author with Anrdrew Koenig of the books Ruminations on C++ and Accelerated C++: Practical Programming by Example.