January 1999 Obfuscated C++ -- ADTmag

January 1999 Obfuscated C++

By Rob Murray
October 4, 2000

Last Month’s Obfuscated C++
Last month we asked you to explain the behavior of the following function. An important correction: The constants hsh and s were missing from last month’s printout.

/* 1*/const long hsh = 33554393;
/* 2*/const long s = 128;
/* 3*/
/* 4*/int
/* 5*/g(char* p, char* buf) {
/* 6*/ long h=0,h2=0,f=1;
/* 7*/ int o=0,len=strlen(p);
/* 8*/ while (*buf) {
/* 9*/    if (*p){
/*10*/        if(o)f*=s;
/*11*/        h=(s*h+(*p++))%hsh;
/*12*/    }
/*13*/else
/*14*/        h2-=f*buf[-len];
/*15*/    h2=(h2*s+*buf++)%hsh;
/*16*/    if (!*p&&h==h2)return o-len+1;
/*17*/    ++o;
/*18*/}
/*19*/return -1;
/*20*/}

This function does a text search. If the string pointed to by the first argument (p) appears in the string pointed to by the second argument (buf), the offset where the substring occurs is returned; otherwise –1 is returned. This function uses a form of the Rabin-Karp algorithm.¹ This algorithm works by computing a hash value for p, and for each candidate substring in buf.

Of course, a brute-force approach—calculating the hash function from scratch for each substring in the target string—would not be any faster than simply comparing the strings. However, we avoid the extra computation by choosing the hash function carefully. In particular, if we already have the hash value for the substring starting at offset i, we can use this value to quickly compute the hash function for the substring starting at offset i+1.

Our hash function is computed by iterating through the characters in the string. At each step we multiply the hash value by 128 and add the value of the next character. The result is then computed modulus a large prime number (hsh, a bit over 33 million).

In our code, p is the target string, and buf is the searched string. The loop beginning at line 8 has been obfuscated to perform two separate calculations serially. The loop first iterates over p and buf, calculating (on line 11) the hash value for p, and calculating (on line 15) the hash value for the initial substring of buf. During this portion of the calculation, *p is non-null, so lines 14 and 16 are never executed. We could rewrite this section of code as follows:

/* Calculate h = hash on p, h2 = hash on initial 
substring of buf*/
/* 8*/ while (*buf && *p) {
/*10*/    if(o)f*=s;
/*11*/    h=(s*h+(*p++))%hsh;
/*15*/    h2=(h2*s+*buf++)%hsh;
/*17*/    ++o;
/*20*/ }

When this loop completes, we have also calculated f, which is 128 raised to the strlen(p) power. We’ll use this value in the next calculation.

Once p becomes null, lines 10 and 11 are replaced by lines 14 and 16, giving us the equivalent of:

/* 8*/ while (*buf) { // Calculate h2 = running hash 
on searched string
/*14*/    h2-=f*buf[-len];
/*15*/    h2=(h2*s+*buf++)%hsh;
/*16*/    if (h==h2)return o-len+1;
/*17*/    ++o;
/*20*/ }

For each pass through the loop, h2 becomes the hash for the substring that ends at buf. This is done by subtracting the contribution to the hash value from the character at buf[-len] (line 14), and then computing the next hash value by adding the character at *buf and reapplying the modulus (line 15). If the hash values match (line 16), we have found a match.

This code makes the simplifying assumption that matching hash values imply a match; to be safe, the code should double-check the substring to make sure we didn’t get a hash collision (unlikely given the algorithm and size of the hash space).

This Month’s Obfuscated C++

Write a legal ISO/ANSI C++ source file that contains no preprocessor directives or comments, and contains the character sequence:

;%

That is, a semicolon immediately followed by a percent sign. Make your file as short as possible.

Reference

1. Sedgewick, R. Algorithms, Addison–Wesley, Reading, MA,
p. 252, 1984.

Rob Murray is Manager, Engineering at the Irvine office of Net Explorer, an object-oriented software consulting company based in Houston, TX. He has taught C++ at technical conferences since 1987, and is the author of C++ Strategies and Tactics. He was the founding editor of the C++ Report and can be contacted at [email protected].

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library