January 1999 Obfuscated C++

.
Last Month’s Obfuscated C++
Last month we asked you to explain the behavior of the following function. An important correction: The constants hsh and s were missing from last month’s printout.
/* 1*/const long hsh = 33554393;
/* 2*/const long s = 128;
/* 3*/
/* 4*/int
/* 5*/g(char* p, char* buf) {
/* 6*/ long h=0,h2=0,f=1;
/* 7*/ int o=0,len=strlen(p);
/* 8*/ while (*buf) {
/* 9*/    if (*p){
/*10*/        if(o)f*=s;
/*11*/        h=(s*h+(*p++))%hsh;
/*12*/    }
/*13*/else
/*14*/        h2-=f*buf[-len];
/*15*/    h2=(h2*s+*buf++)%hsh;
/*16*/    if (!*p&&h==h2)return o-len+1;
/*17*/    ++o;
/*18*/}
/*19*/return -1;
/*20*/}
This function does a text search. If the string pointed to by the first argument (p) appears in the string pointed to by the second argument (buf), the offset where the substring occurs is returned; otherwise –1 is returned. This function uses a form of the Rabin-Karp algorithm.1 This algorithm works by computing a hash value for p, and for each candidate substring in buf.

Of course, a brute-force approach—calculating the hash function from scratch for each substring in the target string—would not be any faster than simply comparing the strings. However, we avoid the extra computation by choosing the hash function carefully. In particular, if we already have the hash value for the substring starting at offset i, we can use this value to quickly compute the hash function for the substring starting at offset i+1.

Our hash function is computed by iterating through the characters in the string. At each step we multiply the hash value by 128 and add the value of the next character. The result is then computed modulus a large prime number (hsh, a bit over 33 million).

In our code, p is the target string, and buf is the searched string. The loop beginning at line 8 has been obfuscated to perform two separate calculations serially. The loop first iterates over p and buf, calculating (on line 11) the hash value for p, and calculating (on line 15) the hash value for the initial substring of buf. During this portion of the calculation, *p is non-null, so lines 14 and 16 are never executed. We could rewrite this section of code as follows:
/* Calculate h = hash on p, h2 = hash on initial 
substring of buf*/
/* 8*/ while (*buf && *p) {
/*10*/    if(o)f*=s;
/*11*/    h=(s*h+(*p++))%hsh;
/*15*/    h2=(h2*s+*buf++)%hsh;
/*17*/    ++o;
/*20*/ }
When this loop completes, we have also calculated f, which is 128 raised to the strlen(p) power. We’ll use this value in the next calculation.

Once p becomes null, lines 10 and 11 are replaced by lines 14 and 16, giving us the equivalent of:
/* 8*/ while (*buf) { // Calculate h2 = running hash 
on searched string
/*14*/    h2-=f*buf[-len];
/*15*/    h2=(h2*s+*buf++)%hsh;
/*16*/    if (h==h2)return o-len+1;
/*17*/    ++o;
/*20*/ }
For each pass through the loop, h2 becomes the hash for the substring that ends at buf. This is done by subtracting the contribution to the hash value from the character at buf[-len] (line 14), and then computing the next hash value by adding the character at *buf and reapplying the modulus (line 15). If the hash values match (line 16), we have found a match.

This code makes the simplifying assumption that matching hash values imply a match; to be safe, the code should double-check the substring to make sure we didn’t get a hash collision (unlikely given the algorithm and size of the hash space).


This Month’s Obfuscated C++

Write a legal ISO/ANSI C++ source file that contains no preprocessor directives or comments, and contains the character sequence:

;%

That is, a semicolon immediately followed by a percent sign. Make your file as short as possible.

Reference

1. Sedgewick, R. Algorithms, Addison–Wesley, Reading, MA,
p. 252, 1984.

Rob Murray is Manager, Engineering at the Irvine office of Net Explorer, an object-oriented software consulting company based in Houston, TX. He has taught C++ at technical conferences since 1987, and is the author of C++ Strategies and Tactics. He was the founding editor of the C++ Report and can be contacted at [email protected].