Predicates vs. Function Objects

A function object that returns a Boolean value is a predicate. That's what almost all tutorials, books, and manuals write about predicates of the STL. This, however, is not the whole story.

Some years ago I tried to write some examples for the use of the remove_if() algorithm. After writing the usual stuff such as "removing all elements that have a certain value,'' I tried something different: using STL algorithms to remove only the third element. I wrote what I thought was an appropriate function object:


class Nth {        // function object that returns true for the nth call
  private:
    int nth;       // call for which to return true
    int count;     // call counter
  public:
    Nth (int n) : nth(n), count(0) {
    }

bool operator() (int) { return ++count == nth; } };

Then I wrote a mainline to exercise it:


// ... appropriate #includes and usings ...

int main()
{
    list<int> coll;

    // insert elements from 1 to 9
    for (int i=1; i<=9; ++i) {
        coll.push_back(i);
    }
    PRINT_ELEMENTS(coll,"coll:        ");

    // remove third element
    list<int>::iterator pos;
     pos = remove_if(coll.begin(),coll.end(), // range
                     Nth(3)),     // remove criterion
    coll.erase(pos,coll.end());

    PRINT_ELEMENTS(coll,"nth removed: '\");
}
The program defines a function object Nth that yields true for the nth time that its operator() is called, so I thought that passing an instance of Nth initialized by three to remove_if() would do the job. I reasoned that my remove_if() statement was saying: "Remove the element for which the third call of the passed function object yields true." Thus, this must be nothing else but "remove the third element."

But the result was a big surprise. The output of the program was (and usually is):


coll:          1 2 3 4 5 6 7 8 9
nth removed(): 1 2 4 5 7 8 9
That is, two elements, namely the third and sixth elements, were removed! This happens because the usual implementation of the algorithm copies the predicate internally during the algorithm:


template <class ForwIter, class Predicate>
ForwIter std::remove_if(ForwIter beg, 
                        ForwIter end,
                        Predicate op)
{
  beg = find_if(beg, end, op);
  if (beg == end) {
     return beg;
  }
  else {
    ForwIter next = beg;
    return remove_copy_if(++next, end, beg, op);
  }
}
The algorithm as implemented in the example uses find_if() to find the first element that should be removed. However, it then uses a copy of the passed predicate op to process the remaining elements (if any). Here, Nth in its original state is used again and removes also the third element of the remaining elements, which is in fact the sixth element.

So, is this implementation of remove_if() incorrect? No (at least, not according to the design of the STL)! The problem is that Nth(3) is not a valid object to pass as an operation to the above remove_if(). The preceding version of remove_if() is defined in a way that it requires the operation to be a predicate (there is also an overloaded version of remove_if(), for which you can pass a plain value).

So, what is a predicate? If you simply state that a predicate is a function or function object that returns a Boolean value (a value that is convertible to bool), then my Nth object is fine. However, there is an additional requirement that is unfortunately not mentioned in any manual or in the C++ Standard: A predicate should always return the same result for the same value.

Thus, if you call a unary predicate for two values x and y and both values are equal, then the predicate should yield the same result. In other words, a predicate should never change its state due to its usage as a function object. This rule is violated by my function object Nth.

What follows from this is that whenever you pass a function object as a predicate to an STL algorithm, the function object should guarantee that it always returns the same result when invoked with the same arguments. That is, the function object should not change its state due to the call. Or, in the language of C++: You should declare operator() as a constant member function (and not play games with mutable or casts). For the same reason, a copy of a predicate should have the same state as the original.

To my knowledge, in current implementations this predicate problem appears in other algorithms besides remove_if(). For example, the usual implementation of unique() for three arguments also passes two different copies of its binary predicate:


template <class ForwIter, class Predicate>
ForwIter std::unique(ForwIter beg, ForwIter end,
                     Predicate op)
{
  beg = adjacent_find(beg, end, op);
  return unique_copy(beg, end, beg, op);
}
However, if you use remove_copy_if() instead of remove_if(), usually all works as expected.

Note that it is possible to avoid this surprising behavior and to guarantee that this algorithm works as expected, even for a function object such as Nth, without any performance penalties. You could implement remove_if() in such a way that the call to find_if() is replaced by find_if()'s contents:


template <class ForwIter, class Predicate>
ForwIter std::remove_if(ForwIter beg, 
                        ForwIter end,
                        Predicate op)
{
  while (beg != end && !op(*beg)) {
    ++beg;
  }
  if (beg == end) {
    return beg;
  }
  else {
    ForwIter next = beg;
    return remove_copy_if(++next, end, beg, op);
  }
}
Library vendors are starting to make such changes. A few weeks ago one vendor told me that they will provide such a new implementation in their next release.

To enable programmers to write portable code, however, the committee has to "fix" this problem in a future version of the Standard by formulating the semantics of the algorithms so that this "surprising" behavior isn't standard-conforming anymore. Whether such a requirement is useful is currently under discussion (I favor it because I hate nonintuitive behavior, especially when it costs nothing to avoid it).

In the meantime, to be sure your code is portable you should ensure that your predicates always return the same result for the same values. Thus, you should always declare the function call operator of predicates as a constant member function.

About the Author

Nicolai Josuttis is the author of The C++ Standard Library—A Tutorial Reference, an active member of the C++ Standard Committee library working group, and a partner at System Bauhaus, a German group of object-oriented system development experts. He may be contacted at [email protected].