Decryptix! and Smarter Strings

. Jesse Liberty is President of Liberty Associates Inc., an object-oriented software development company, and the author of Teach Yourself C++ in 21 Days. He can be contacted at [email protected].


In previous columns, I have talked about Decryptix!, a project I'm developing as part of my forthcoming book C++ From Scratch.* You'll remember that Decryptix! is based on the children's game MasterMind by Pressman, in which one player creates a pattern of colors and the other player must guess the pattern. In Decryptix! we substitute letters for numbers, giving us patterns such as "a, b, c, d, e" instead of "blue, black, white, red, green." This allows us to develop this project on any platform, without worrying about graphics.

This month I'll talk about an object-oriented solution to a problem that arises in this game when the computer is guessing a pattern created by the player. For example, suppose the player has created a five-letter pattern, using seven possible letters (a-g) with no duplicates. Thus, the patterns abcde, abcdf, abcdg, abcea, and so forth are all legal. The computer must choose from among the 2,520 possible legal patterns to find the player's combination as quickly as it can.

Unfortunately, computers have no sense of intuition, and so can make no leaps of insight. If the computer progresses manually through all the possible strings from abcde to find the string gfdce, it will need to guess over 2,500 times before finding the pattern. However, a few simple adjustments based on the following rules will reduce this to six guesses, each of which will be generated almost instantly.

First, if the number of letters in a guess is equal to the number scored as correct, then obviously all the letters are correct. However, it is possible to play with duplicate letters, so the more general statement is, if the number of unique letters in a guess is less than or equal to the number of correct letters, then the unique letters must be correct. Let's call this the forcing rule.

The next observation is that if zero letters in the current guess match the solution, then we can eliminate all the letters in a guess from every position. We'll call this eliminate rule 1.

If we find that the number correct in a guess is exactly equal to the number already known to be forced (under the forcing rule), then all the non-forced letters can be eliminated. Thus, if the guess is abcde and we know that a and b are forced, and the number correct is two, then c, d, and e can be eliminated from all positions. We'll call this eliminate rule 2.

Now we examine how many letters are in position. If none is, then we can eliminate these letters from their current positions, and reduce our guesses. Thus, if we guess abcde and the in-position score is zero, then we know that a cannot be in position one, b cannot be in position two, and so forth. Call this eliminate rule 3.

We'll capture all of these rules in a SmartString class. A SmartString will consist of a vector of smart characters. Each character will know its current value (e.g., c) and the range of values possible in its position. The smart character doesn't need to know its position; that responsibility will belong to the string. The character just needs to know "I could be a, b, c, e, or g, but right now I'm c."

This is consistent with two of the most fundamental rules of object-oriented design:

  1. Create classes which have a small, coherent set of responsibilities.

  2. Put responsibility in the class with the appropriate knowledge.
The string class is only responsible for managing the string as a string, and the character class manages the individual characters. Thus, the string class knows if the string itself is consistent, and the character class knows which letters are possible for that character. We'll see how this works by examining the code in some detail.

Listing 1 shows the declaration for SmartString, Listing 2 shows its implementation, Listing 3 shows the declaration for SmartChar, and Listing 4 shows its implementation. The complete working code can be found on my web site: http://www.libertyassociates.com. Click on "Books and Resources" and navigate down to the section for C++ Report where you'll find a link to the complete source for Decryptix Smart String version. Please note, I've added line numbers to these listings to make the analysis easier to follow.

Each SmartChar is in a specific position within the SmartString. The member variable myCharacters, shown in Listing 3, keeps track of every legal character for the current position. myChar keeps track of the offset into the myCharacters vector of the current value for the position. Thus, if we are playing with five letters in five positions, and the current guess is abcde, then the smart character for position three will have a, b, c, d, and e in its vector, and 2 in myChar. (2 because a=0, b=1, c=2 and this character is currently c.)

As the computer learns that a given position cannot have certain characters, they can be eliminated from the myCharacters vector, and this will track per-position knowledge.

The member variable myChar is initialized to 0 (a) if duplicates are allowed, or to the next letter in sequence if not. Thus, if there are no duplicates, the first five-letter string would be abcde.

For example, if we were playing seven letters in five positions, with no duplicates, we'd create five SmartChar members in the SmartString myString vector. Each of these SmartChars would itself hold a vector of characters with the letters a, b, c, d, e, f, g—the legal letters—but myChar would be initialized to the offset of a for the first, b for the second, and so forth as illustrated in Figure 1.

figure 1

Figure 1.

THE COMPUTER GUESSES When it is time for the computer to play, it will offer a guess to the player, and then, based on the score received, it will endeavor to find the next possible string, eliminating all impossible strings without ever showing them.

Here is the essential excerpt of code from the Computer class's Play() method:

for ( ;; )
{
	Guess theGuess = OfferGuess();
	history.push_back(theGuess);
	//…
	if ( 
		! mySmartString->CanEliminateCharacters(theGuess) || 
		! IsConsistent(mySmartString->GetString()) 
		)
			GenerateAGuess();
}
The logic is this: offer a guess to the user and find out how many letters are correct and how many are in position. Create a Guess object to hold the string and the score and store that Guess in a vector of Guesses called History. If you have not won the game, see if you can eliminate characters before guessing again. Make sure that your next guess is consistent with all previous guesses.

The essential code for generating a guess looks like this:

do
{
	mySmartString->GetNext(); 
	//…
} while ( !IsConsistent(mySmartString->GetString()) );
That is, create new strings until one is found that is consistent with previous guesses. Before offering a new guess, the computer must ask "What if this guess were the actual answer, do the scores I have received so far make sense?" (That is, "are they consistent?")

For example, suppose the previous two rounds look like this: abcde - 5 letters/ 1 in position acbed - 5 letters/ 1 in position

The computer is ready to consider acdbe as a potential solution. Could this have been the correct answer all along? No, because if this were the correct answer, then the first guess would not have scored 5/1 but rather 5/2 (a and e would both have been in the right position). Because the first guess was scored 5/1, then acdbe cannot be the right answer, as it is not consistent with the previous guesses, so it can be eliminated!

It turns out that this simple consistency check is far more efficient at eliminating wrong answers than all the other rules, but we'll implement them all to give the computer the fastest performance.

For now let's focus on consistency checking by examining a game in which we play with 10 letters (a-j) in five positions with no duplicates, and the answer is ghijf. The computer first generates abcde.

We score this 0/0. Thus the computer knows that none of the characters guessed (a, b, c, d, e) appears in any position in the answer.(Eliminate rule 1.) This causes a call to RemoveCurrentCharactersInEveryPosition(), shown in Listing 2. The computer iterates through the string and extracts each character in the guess (abcde), removing it from all the smartCharacters by calling SmartChar::Remove() and passing in the character. It then adds each character to the vector deadCharacters if it is not already there:


123:           vector::iterator it2 = myString.begin(); 
124:           it2 != myString.end(); 
125:           it2++
126:           )
127:         { 
128:           it2->Remove(theChar);
129:           if (! In(deadCharacters,theChar) ) 
130:           {
131:                    deadCharacters.push_back(theChar);
132:                    cout < "eliminating="" "="">< thechar="">< endl;="" 133:="" anydeleted="true;" 134:="" }="" 135:="" }="" 136:="" }="" 137:="" }="" 138:="" return="" anydeleted;="" 139:="" }="">
The net effect of all of this is that a, b, c, d, and e are eliminated from every position. The computer will now return to the calling function, which will check whether the string that results from eliminating abcde is valid.

What string is that? fghij. Why? You'll remember that the first guess was abcde, which means that the myChar variables were set to 0, 1, 2, 3, 4 and the possible characters were [abcdefghij]. abcde is eliminated from each, so the characters left are [fghij]. Since the myChar value has not changed, these offsets now correspond to fghij. That is the very first string to check, and the computer now checks to see if it is consistent with the answers already in History.

The only thing in History is abcde. If the answer had been fghij we would have scored it 0/0, which is how we did score it, so this is consistent. Given that it is consistent, there is no reason to generate a new guess, and so the computer offers the player the guess fghij.

This we'll score 5/0. We have all the letters, but none is in the right position. Thus, the computer forces these five letters (forcing rule), eliminates the next 528 strings as inconsistent, and generates the guess gfijh. This scores 5/3. The next 248 strings are eliminated, and the computer generates ghijf. Thus the right answer is achieved in four guesses.

Let's look at one more example. This time, assume we are playing with seven letters in five positions with no duplicates and the computer guessing. Our answer is gadbe.

For the first guess the computer guesses abcde. We score this 4/1. The 4 correct letters are abde and the 1 correct position is e (the fifth).

The next guess, after eliminating 302 alternatives as inconsistent, is acbef. We'll score this 3/0.

There are no forced characters as a result of this or the previous guess, so we pick up the analysis on line 159 of Listing 2 where we call RemoveCurrentCharacters():


159:    if ( inPos == 0 ) 
160:    {
161:         anyDeleted = RemoveCurrentCharacters();
162:         return anyDeleted; // we did eliminate characters
163:    }
This instructs the SmartString that, because none of these letters is in the correct position, it can safely remove the current characters from the list of possible characters. (Eliminate rule 3.) That is not to say, for example, that there are no a's in the answer, only that the very first letter must not be a, the second letter must not be c, the third must not be b, the fourth must not be e and the fifth must not be f.

This branches to line 64 of Listing 2 where the computer iterates over myString and get each character in turn:


69:for (
70:    vector::iterator it = myString.begin(); 
71:    it != myString.end(); 
72:    it++
73:    )
74:{
75:    theChar = it->GetChar();
On line 77 each character is checked to make sure it is not in forcedCharacters—a vector of letters which cannot be deleted:

77:   if ( ! In(forcedCharacters,theChar) )
78:   {
79:        theChar = it->RemoveCurrent();
If it is not in forcedCharacters, the computer removes it by calling RemoveCurrent() on the smartChar. This branches to line 31 of Listing 4, where the character is extracted (so its removal can be reported), and then that character is erased from the array of possible characters:

37:    theChar = GetChar();
38:    myCharacters.erase( myCharacters.begin()+myChar );
39:    while ( myChar >= myCharacters.size() )
40:        myChar-;
Note that myChar (the offset into the array pointing to the current character) is unchanged as long as it's not pointing past the end of the array. Thus, if a character is removed, it points to a new character. If the character removed is the last in the array (in this case, g), then myChar must be decremented to ensure it points to a valid letter.

figure 2

Figure 2.

To review, before the characters were removed, our string looked like Figure 2. Immediately after the characters were removed our Smartstring looked like Figure 3.

figure 3

Figure 3.

The value of the offsets hasn't changed, but the string that results has changed, because letters were removed. For example, in the first character offset 0 used to point to a, but now it points to b. The same is true for all the others. The same five offsets which pointed to acbef now point to bdcfg. This value is checked for consistency before it is offered as a potential answer.

The point of this exercise is to see that by delegating responsibility to the SmartString for managing the string, and to the SmartCharacter for managing the characters, we can isolate these responsibilities and create code which is more robust, more encapsulated, and easier to maintain. In a subsequent column I'll extend this game and add object persistence.

Footnote: * QUE 1999, ISBN 0789720795.

About the Author

Jesse Liberty is the author of numerous books on C++ and object-oriented software development, including C++ from Scratch and Beginning Object-Oriented Analysis and Design. Please visit his Website at www.LibertyAssociates.com