Algorithm = Iterator + Visitor

The authors demonstrate that an algorithm implementation can be generalized into a composition of iterators and visitors (higher-order function objects). This is known from STL and from research in generic and functional programming, but it has not been demonstrated that the notion of iterators and visitors is sufficient for allowing polytypism in entire categories of algorithms without sacrificing on polymorphism in data structures. This paper emerges from recent experiments in the design of the Extended Function Library (EFLIB), a framework developed by Johan L. Larsson. These experiments show how large groups of algorithms can be classified using a set of visitors and iterators, and how data structures can be decomposed into lists and generators to achieve polytypism in an object-oriented language, thus making a program more open to change.

Reuse of data structures sometimes contradicts reuse of algorithms because both require flexible adaptation and extensibility. Many libraries have successfully reused data structures, but the reuse of algorithms remains a difficult problem. It is difficult to make algorithms reusable because they are often strongly tied to specific data structures. A generic algorithm must be applicable to many data structures, even to unforeseen data structures that may be added to the library after the generic algorithms are designed. The Standard Template Library (STL) for C++ [Musser96a] is a library that combines data structure reuse and algorithm reuse, but STL has flaws. For example, a sequence algorithm cannot easily apply to user-provided graph data structures. Other libraries, such as LEDA [Melhorn98a], are more complete, but there are still problems with flexible adaptation and extensibility. Karla [Frick94a], a library with data structures and algorithms for the Sather language, emphasizes robustness. The demand of robustness makes algorithm and data structure reuses an even more difficult task.

Figure 1
Figure 1. Algorithm and data structure decomposition: structure vs. data.

We consider an algorithm to be a computational procedure that transforms some given input into some output during a finite number of precisely defined steps [Knuth97a]; in that sense, we view an algorithm as a (higher-order) function that maps input to output. We will demonstrate a particular object-oriented decomposition of algorithms and data structures that partially solves the reuse problem. The experiments emerge from EFLIB. The approach presented splits an algorithm into two computational parts: the part that involves structure (the iterator) and the part that involves data (visitor). This decomposition adds flexibility and genericity. However, to support generic algorithms, we must also decompose data structures. Data structures are decomposed into generators and lists. The former generates the structure of the output and the latter is the data in its purest form (see Figure 1).

Similar decompositions have been done before in STL and LEDA in terms of function objects, and have been discussed by Kühl and Weihe [Kühl96a] in terms of loop kernels. However, neither STL nor LEDA provides a broad basis for algorithm reuse. Algorithms in these libraries are still tied to only a few types of data structures. For example, a graph algorithm in LEDA which views graphs as tables of edges and nodes, or any subalgorithm it may use, is not generic with respect to types: it only applies to graphs represented using tables. STL considers function objects, small-grained and non-generic encapsulations of functions. Such function objects are not sufficient to allow for full algorithm reuse. They depend on template classes, which do not allow for polytypism. Function objects are merely operations that apply to single elements; we consider function objects to be more general. We suggest that visitors, derived from the visitor design pattern [Gamma95a], are better suited abstractions of algorithm subcomputations because they can communicate subcomputations, can accumulate partial results, and can apply to many elements at the same time. Recursive functions analogous to function objects have been investigated by Bird [Bird87a] resulting in a theory of lists, and later by Malcolm [Malcolm90a] resulting in an extension of Bird's theory. Recursion replaces the need for iterators, but eliminates the flexibility added by a separate abstraction. STL defines iterators and combines them with function objects resulting in some of the list operations devised by Bird. However, the extension of Bird's theory into other data types has no counterpart in STL. Weihe and Kühl [Kühl96a] have suggested structural iterators as a complement to the linear iterators in STL, but structural iterators are specific to graphs. Something is lacking in STL and LEDA before Malcolm's extension can apply. We believe it is the introduction of generators as opposed to iterators. The theories of Bird and Malcolm would give an object-oriented algorithm library more flexibility than any functional approach because iterators are separate, extensible abstractions. Iterators allow for some generality, but we will show that linear iterators cannot give us polytypism. Structure iterators as defined by Nissen, Kühl and Weihe [Nissen96a] can provide polytypism but require that every data type be viewed as a graph.

In the following section, we motivate the reuse of algorithms and relate our work to research in generic and functional programming. In later sections we'll describe the decomposition of algorithms into iterators and visitors and give examples of problems that occur when this abstraction is used; describe a similar decomposition of generic data structures; study elementary generic algorithms and evaluate our decomposition technique; and finally, we'll conclude with brief comparisons of related approaches and summarize our contribution.

Reuse is motivated by improvements in quality and productivity. Quality can be improved by reusing algorithms because the most efficient algorithms are written by experts using heuristics and techniques that cannot be found in textbooks [Weihe98a]. Reuse is a quality assurance for algorithms: contiguous reuse means contiguous testing. Productivity improvements are also to be expected from correct algorithm reuse. However, today's algorithm and data structure libraries are very complex and do not provide sufficient infrastructure to ease the adaptations. Weihe [Weihe97a] lists four key goals for the design of reusable algorithms. A reusable algorithm must be easy to adapt, rely on small and polymorphic data structure abstractions, should be implemented so that its algorithmic functionality is flexibly adaptable, and should be inspectable. We consider only three characteristics: genericity, flexibility, and simplicity. An algorithm should be generic because it is much more valuable to reuse something generalized. Flexibility includes adaptability, which is important to allow the generalized algorithm to easily turn into specific algorithms. Flexibility also allows for reconfiguration of an algorithm even after it has been composed, which is impossible in STL and other template-based libraries such as LEDA. Simplicity is crucial for algorithms. Reuse can be much more productive if we do not force complete understanding of the underlying implementation. Shared interfaces provide a unification of interfaces thus promoting flexibility, but they also makes it easier to understand algorithms. Further, certain hidden run-time mechanisms can make algorithms simpler to use by transparently making decisions, for example selecting the most effective subcomputation among those that are currently available in the system.

The first step towards algorithm reuse is to decouple algorithms from data structures. Such decoupling can easily be done in object-oriented programming by providing abstract base classes for data structures, for example lists and trees. Although abstract classes may allow for some generality, it is not enough because of two reasons: 1) a generic algorithm should apply to lists, trees and any other data structure with any other interface (polytypism) and 2) flexibility is far too low in a solution relying only on inheritance. We present a solution to the reuse problem that involves decomposition of algorithms and data structures. However, decomposition implies more complexity. Such complexity must be handled by explicit mechanisms. The library concept is therefore not the suitable form for algorithm reuse. A framework is better suited because it may provide infrastructure that binds algorithms and data structures together, for example mechanisms that allow for better adaptations (without massive parameter passing at compile time). A framework puts considerably less complexity in the user's view [Larsson0a].

Algorithm Libraries
Previously, we mentioned that it is difficult to provide a library where both algorithms and data structures are flexible, extensible, and robust. We'll consider three examples that indicate why: STL, LEDA, and Karla. All three are object-oriented libraries aiming at the reuse of algorithms and data structures. STL emphasizes both algorithms and data structures but only considers sequences. LEDA emphasizes graph algorithms and provides complex monolithic implementations of graph data structures. Karla allows very little genericity and aims at robustness.

STL [Musser94a] relies on templates, which are a static form of polymorphism. Templates prohibit flexible adaptation of algorithms and data structures because once a template is parameterized, the parameters cannot change. Further, parameterized classes and functions are complicated to understand. Parameterization cannot be supported at run time. Although default parameters can help us a little, adaptation of STL must always require understanding and consideration of parameters. STL distinguishes four conceptual entities (Figure 2) in the context of generic algorithms: the algorithm, the iterator, the function object, and the data structure (container). We believe that decomposing data structures and changing the role of function objects should improve this decomposition.

Figure 2
Figure 2. STL conceptual entities.

STL claims to be "open and orthogonal and thus extensible" [Musser96a], but there's more to extensibility than being orthogonal and open. Abstractions provided by STL are not fundamental enough to easily allow for the introduction of graph or tree data structures. Research has shown that such data structures require additional iterators: linear iteration over nodes and edges is not sufficient. Algorithms provided in STL only apply to sequences.

LEDA, the Library of Efficient Algorithms and Data Structures version 3.7.1, focuses on efficiency and ease of use [Melhorn95a]. LEDA provides graphs and graph algorithms. There are also a few elementary data structures such as sequences and dictionaries, but LEDA comes with no generic algorithms for use with them. A graph in LEDA is a very complex data type with a wide interface. LEDA may support advanced graph algorithms, but it is not general enough to consider polytypic algorithms. Fundamental algorithms, for example partitioning and map operations, should apply to every data type to allow for reuse, not just graphs.

Karla [Frick94b] is an attempt to provide reusable and robust algorithms. Genericity in Karla is mainly for construction of data structures [Frick94a]. We believe that generic algorithms are of at least equal importance, because they often are harder to design. Focusing on either algorithms or data structures is not enough. We must make both of them generic to allow for polytypism. Frick et al [Frick94b] observe that iteration is a recurring task for algorithms, and they generalize iteration into Sather streams. The stream approach restricts the reusability of algorithms considerably. A Karla algorithm must have a sequence as either input or output. Combining an algorithm that turns a data structure into a stream and an algorithm that turns a stream into a data structure results in an approach similar to ours, but Frick et al do not consider such an approach.

Richard Bird has developed a theory of lists involving generalized operators such as map, reduce, accumulate, and filter [Bird87a] . Bird's theory is directly applicable to functional programming because it relies on parametric polymorphism and recursion. The theory of lists has been extended to arbitrary data types [Malcolm90a]. In recent years, many other researchers have investigated generalized operators in a functional setting [Hoogendijk97a]. The use of generalized operators for arbitrary data types is known as polytypism and sometimes as shape polymorphism. Polymorphism means that something can exist in many forms. For example, inclusion polymorphism is the same as inheritance in object-oriented languages. The core idea is that a subclass can be used instead of a class if it also is a subtype [Budd97a] (i.e., superclasses are included in the interface provided by a subclass). Polytypism is a form of polymorphism that has been used in functional languages for some time. Polytypism means independence of types. A polytypic program is a program that works for types with different classes without further change. In a polytypic view, there is no need for a specific algorithm implementation for a new type added to the system. Instead, implementation of a polytypic operation is shared among every type, even types that are added long after the operation itself was implemented. An example of a polytypic operation defined by Bird is map. Map takes any data structure and applies a function to every element, resulting in another data structure, with the same structure but possibly with different data (elements). Another example is Bird's filter operation that selects a subset of the elements in a data structure using a predicate parameter (a function returning true or false for each element). Both of these iterate over the data structure by means of recursion and apply the same function to every element. A recursive function may remember previous results in parameters.

List operations
Many algorithms apply to lists (sequences), for example, zip, concatenation, reverse, and other higher-order functions such as map, fold, and filter. Bird has formalized many of these operations for lists in the context of functional programming, for example, ( map f xs = [ (f x1) ... (f xn) ] ). A map and any other list operation can easily be implemented in object-oriented programming using iterators and visitors. Visitors that apply to lists are known as function objects in STL. In STL, function objects are parameters to algorithms. Consider for example the "map" algorithm acting on an STL sequence:

template <class InputIterator,
            class Sequence,
             class BinaryOperation>
   T map <InputIterator first,
      InputIterator last, Sequence result,
      Function f)
      while (first != last)
        /* Insert output of function into result */
        result.insert (first, f (*first++));
       return result;
STL defines list operators as functions. In the example, map is parameterized with respect to iterators and visitors, although the visitors, a.k.a. function objects in STL, are restricted to be mere operations encapsulated in an object. In EFLIB, visitors are more than operations that apply to single elements. A visitor can itself perform subcomputations or accumulate results. A visitor need not visit just one element at a time. A block iterator can well pass the visitor an element that is itself a data structure, for example, a partition of elements from the iterated data structure. This allows for better decomposition and thus improved reuse of algorithms.

A data structure has two sides: the structure and the data stored in the structure. For example, a graph or a list may contain the same elements, but the structure can be different. This leads to a fundamental conclusion about algorithms:

Algorithms can typically be decomposed into two computational parts with regard to the input: the computational part that considers only the structure, and the computational part that considers only the data.
A generic algorithm should be both polytypic (with regard to structure) and polymorphic (with regard to data, i.e., the elements). A generic algorithm should also be decomposed into smaller entities: iterators and visitors. This gives rise to our main thesis:

Algorithm = iterator + visitor

Iterators are the means for repeating one or more computational steps over and over again. At each step, the algorithm knows the current position in the computation by a set of iterators pointing to elements. A generic algorithm is further decomposed with respect to the computational steps, represented as visitor classes in object-oriented programming. A visitor visits elements as the iterator moves forward. Generic algorithms can be made very flexible in object-oriented programming because the iterator abstraction separates an algorithm from the structure it is operating on (see Figure 3). Furthermore, because many algorithms share visitors or even groups of visitors, there can be a consistent and small—but yet complete—set of building blocks for algorithms. A program that uses a set of generic algorithms can therefore easily replace or adapt algorithms to change the overall operation of the program. This conclusion is important for a framework because algorithms must be a central part of any application.

Figure 3
Figure 3. Algorithm abstraction. A generic algorithm is decoupled from the input data structure. It relies on the iterator to generalize the traversal of the hidden structure of the input, and relies on a generator to reconstruct the structure in the output. Data is hidden in accessor objects.

There are essentially two approaches to iteration abstraction in object-oriented programming: iterator objects [Stroustrup90a] and iterator methods [Murer96a]. Many variations exist for these approaches. For example, an iterator object can be viewed as a stream [Aho83a] rather than a cursor. Invariant control is crucial: should the iterator decide when the iteration terminates, the number of steps to move forward, etc, or should these decisions be taken outside the iterator? Smalltalk uses iterator methods for full encapsulation of a loop. Only predicates (blocks that evaluate to true or false) may be passed into the iterator method. The Sather language [Murer96a] takes another approach to iterator methods and considers iterators separate from the loop language constructs. Implementing iterators in methods gives them low flexibility and generality. An iterator should be a separate object. In C++, iterator objects are separated from the loop construct. An iterator in C++ is essentially a cursor surrounded by a for-loop. Iterator objects are more flexible because we can easily design a new iterator class that applies for every data structure. Separate objects promote information hiding because they can be used through a narrow superclass interface. This is why the representation of iterators as separate objects is crucial for the reuse of algorithms.

A visitor is a higher-order function that need not only apply to one element at a time. A visitor has a state and can therefore accumulate partial results (a tree in Figure 4). The visitor visits elements via accessor objects (small circles in Figure 4). However, a visitor is not a complete generalization of an algorithm. An algorithm often involves repetitive steps, calling for iterator objects. These iterative steps can be placed in a loop kernel.

Figure 4
Figure 4. Principal structure of a visitor.

We believe that algorithm reuse demands a particular form of generic data structure. Such a data structure should be decomposed into a generator and a list.

Data Abstraction
An algorithm should not directly apply to a concrete data structure but rather to an abstract and minimized logical representation of the underlying physical data structure. Data can be wrapped in generalized accessor objects providing a wall and a unary interface. The algorithm is parameterized with an iterator, which further encapsulates the accessors. This gives rise to a two-fold data structure abstraction in EFLIB (see Figure 5).

Figure 5
Figure 5. Data abstraction. The figure shows two walls around a data structure: the logical representation and the element accessors.

Kühl and Weihe [Kühl96a] claim that "implementing all [fundamental graph] algorithms on a couple of standard data structures is often not possible" and that we, therefore, must focus on algorithms as primary units for reuse in contrast to STL, LEDA, and other libraries. Our goal is to reuse both algorithms and data structures, and our approach is based on a comprehensive decomposition of them into smaller subcomponents. We have found that a data structure can be decomposed into two important subcomponents. This decomposition allows for much better reuse of algorithms with respect to standardized data structures. The key observation is that every data structure can be generated from a list and a generator component specific to the data structure. An iterator traverses a data structure element by element, thus giving rise to a list. The order of elements in that list depends on the iterator. Iterators therefore reduce a data structure into a list. Important information about the structure of elements will disappear because a list cannot express the structure of, for example, a graph or a tree. To allow algorithms to be truly generic, we must be able to regenerate the structure even after an iteration. Therefore, each iterator should also have a generator, which reverses the "degeneration". For example, an iterator may turn a tree into a list, and the corresponding generator would allow us to rebuild the tree again from the elements and their linear order inside a list. This gives rise to a second main thesis:

Data structure = list + generator

A generator can mathematically be viewed as a function that takes an element ε and a data structure α and returns a new data structure ß isomorphic to α but containing ε, that is:

Generator: ε + α ♦ ß

We can generate any data structure from a list if we know all about its structure. The question is what minimum number of generators (and associated iterators) are sufficient for generic algorithms. A complete classification of generic algorithms into groups depending on the required generators is outside the scope of this paper. We will briefly investigate the components of some elementary algorithms. An important point to notice here is that not every iterator has an inverse. For example, DFS and BFS do not. However, a generator can and should be loosely coupled to a data structure. Looking up the adjacent nodes in the original graph A can easily generate an isomorphism from a graph A, for example. That is, the problem is solved by having slightly more specialized generators, which are connected to a data structure object (Figure 6).

Figure 6
Figure 6. Generator. In the view of an algorithm, a generator appears as an isolated component of the input data structure. A wall separates the current generator from the data structure itself.

A generator need not be a bottleneck for efficiency. A well-chosen implementation should only require O(n) time to generate a data structure from a list. By pushing the generator into the algorithm we can completely eliminate the overhead.

A traversal over elements sometimes modifies the underlying data structure. Such modifications can concern individual elements (i.e., the data), but also many elements (i.e., the structure). Modifications introduce a problem because many iterators may be traversing a data structure at the same time. References to elements are easily damaged and worse, an algorithm cannot rely on the data structure to be the same even in the shortest of computations. Access control is evidently required. ET++, an application framework written by Gamma and Weinand [Gamma88a] uses simple robust iterators [Kofler92a], as does MacApp and other frameworks. ET++ puts the responsibility of robustness on the iterator. An iterator in ET++ must register into a database and continuously inform the database about its state (active, ready, or terminated) [Kofler92a]. The state of an iterator in ET++ is similar to a thread in a concurrent system. Nissen [Nissen97a] suggests that certain iterators, so-called "safe iterators", notify each other when they change the underlying data structure, thus avoiding inconsistencies.

We suggest that iterators should not primarily consider robustness but that robustness concerns instead should be handled in separate accessor objects. An accessor object is an abstraction of the elements in a data structure. It is the responsibility of the accessor to grant only safe access. Furthermore, we suggest that structural modifications are not allowed for data structures and we hence plea for a functional (stateless) view of algorithms.

An algorithm can be viewed as a function, but it can equally be viewed as an operation that can be applied to some data (see Figure 7).

Figure 7
Figure 7. Dual view of an algorithm.

A problem arises when we view an algorithm as a function that maps input to output. Some algorithms, for example sorting algorithms, operate directly on the input. Instead of separating so-called "mutating algorithms" from other algorithms, we can add transparency wrappers around data structures. Robustness is critical when an algorithm is mutating the input. We need a mechanism that allows for safe mutation and introduces the concept of delta-mutation. Delta-mutation of a data structure means that a separate logical copy is maintained inside the algorithm in the form of a difference list. That copy appears as a separate (cloned) data structure.

Figure 8
Figure 8. Two approaches to transparency. Difference list (to the left) allow the algorithm to seemingly operate directly on the input, although a separate copy is maintained and returned. The generator approach (to the right) instead restricts the algorithm to process only lists, and put the reconstruction of structure on a generator object.

In this section, we demonstrate generic implementations of some elementary algorithms. During the study of these elementary algorithms we make some observations that lead us to a classification of visitors and iterators in the following sections. We view algorithms as functions that map input to output and classify algorithms depending on the type of input and output. The most trivial form of mapping is element to element. Other mappings include graph to graph and list to element.

Newton-Raphson's Method
A very specific group of algorithms can be represented by Newton-Raphson's method. The input to such algorithms is an equation or, more generally, a mathematical expression. The output is not tied to the structure of the input expression. Therefore, iterators may appear useless. However, Newton-Raphson's method does iterate, but not over the expression. The input is also the number of steps we want in the computation. These steps can be described as an interval, i.e., a collection (without actual elements). Iteration over such an interval corresponds to the generic algorithm definition. This leads to our first observation:

Observation 1

Even iterative algorithms can be implemented using iterators over intervals. Iterators then control the execution of computation.
List to Element

Binary Search
Binary search can apply to any sequence with some linear ordering property. Binary search relies on the partitioning of a data structure into smaller and smaller partitions. Partitions are subsequences of elements picked from some data structure. Partitions are themselves sequences allowing for recursion. Elements in a partition can be picked in any order, for example {1, 3, 5, 7}. In binary search, a partition is a consecutive subsequence of the previous partition. Binary search makes no difference between a partition and the original input sequence, thus allowing us to recursively apply the same computation on smaller and smaller partitions. The final partition has only one element. Such a partition is not the expected output of the algorithm. Either the output is the position where the element was found or it is nothing at all. Therefore, before terminating, the partition must track its corresponding position in the input sequence, if such exists. This leads to a second observation:

Observation 2

The result of iteration and of applying visitors sometimes needs to be transformed into the expected form of the output.
In summary, binary search algorithms recursively apply the same computation on polymorphic (sub-) sequences and end up with a partition that needs to be transformed into a valid output. We therefore identify three components in binary search: the iterator that splits a sequence into two equally-sized parts, the partitioning step performed when one half is visited, and the transformation step when the intermediate result is transformed into the expected output.

Boyer-Moore Matching
A sequence-matching algorithm like Boyer-Moore moves forward in a sequence and attempts to match a predefined block. Binary search terminated when there were no more possible partitions. For simplicity, Boyer-Moore is generalized as an algorithm that terminates after the first match is found. In binary search, iteration was separated from the visit into a partition. Iteration simply provided a visitor with a partition, and a visitor then triggered another iterator to do the same thing over again. The iteration over possibly matching blocks in Boyer-Moore is different. There is an outer iteration determining the position in the sequence, and an inner iteration determining the position in the current block. These two iterations are clearly separated in a naive matching algorithm because the block iterator moves only one element ahead at a time. Boyer-Moore matching does better. The number of elements to move forward in the outer iteration depends on the result of the inner iteration, thus avoiding redundant work. The inner iteration is part of a visit to one of many blocks determined by the outer iteration. For Boyer-Moore, the bad-character heuristic tells the outer iteration how many steps to move forward. It may move ahead more than one element if bad characters exist.

Figure 9
Figure 9. Boyer-Moore matching using visitors and iterators. The left- most iterator traverses the input, and the right-most iterator traverses the current block attempting to find a mismatch. The iterators work in opposite directions.

Note the need for communication between iterators and visitors when the visitor performs a computation of importance for determining the next step in the iteration. Although an iterator should be an independently extensible component in an algorithm, it sometimes needs to be intertwined with a visitor. We need to compromise and provide a visitor with an explicit connection to an iterator. Such a visitor actively participates in the iteration, but it is wrapped around an arbitrary iterator, thus making the iterator independently extensible. This leads to a third observation:

Observation 3

Iterators sometimes need to communicate with visitors, compromising on isolation. Such communication should happen through predefined and minimized interfaces.
This example demonstrates one weakness with the decomposition of algorithms into iterators and visitors: they are not always isolated from each other.

List to List
One important category of list-to-list algorithms are sorting algorithms. Sorting algorithms can be decomposed further. For example, merge sort has two separate steps, divide and conquer, where divide means splitting a data structure into partitions and conquer means merging them together again. Quicksort can be decomposed into partitioning and finding the k:th element. Exchange sorting can be generalized as comparator and iterators only.

Partitioning is an algorithm. For example, pivot partitioning for quicksort and division in merge sort clearly fall into our definition of an algorithm.

Figure 10
Figure 10. A possible hierarchy of partitioning classes. Delta-structures seamlessly represent modified copies of data structures.

Merge Sort and Quicksort
Merge sort and quicksort are variations of the same idea. Both involve recursively partitioning a list into smaller lists. Quicksort mutates the partitions to find a pivot element whereas merge sort, being a divide and conquer algorithm, operates in two separate steps. The first step in merge sort is to partition elements, and the second is to combine (merge) them into sorted order. Obviously, partitioning is common to both algorithms.

Figure 11
Figure 11. Decomposing the quicksort algorithm. Recursion corresponds to iteration over a partition list.

Sorting Networks
Sorting networks are generalizations of a certain kind of sorting algorithm. They are generic algorithms because they can be represented as a collection of iterators and comparators. Sorting networks requires pair wise iterations. A sorting network repeatedly iterates over the input and performs a comparison and optionally an element exchange at each step. The iterator is the central abstraction in sorting networks, not the visitor. Sorting networks can iterate in a serial or a parallel fashion. Parallel iterations can be represented using concurrent processes. Serial iterations require an encapsulating iteration over the pair wise iterations.

List to Tree

Huffman Algorithm
The Huffman algorithm is a typical, but trivial, example of an algorithm that accumulates partial results. Although the Huffman algorithm is for compression of sequential data—it is in fact a list to list algorithm—it can also be seen as a higher-order function that maps a sequence of uncompressed data to a prefix tree. A prefix tree is not the expected output of the Huffman compression, it is just a partial result, but the further computation is straightforward. During the iteration over the uncompressed sequence we must represent that partial result, which is the prefix tree. A visitor object can encapsulate a state, and therefore can accumulate the result of every previous visit to an element, leading us to a forth observation about visitors.

Observation 4

If the same visitor object is applied to every element, then the state of that visitor can accumulate the results of iteration.
A visitor is therefore suitable as an accumulator of partial results, and as such it is strongly related to the builder design pattern [Gamma95a].

Graph to Tree
Graphs are a challenge for the iterator abstraction. In the search for graph iterators, several taxonomies exist [Melhorn99a], or have been suggested [Kühl96b]. Kühl and Weihe [Kühl96b] suggest that graph iterators are separated into node iterators and edge iterators, both linear, and adjacency (structural) iterators [Nissen97a]. This separation is interesting because every tree or list structure is also a graph. However, representing lists as degenerated graphs forces a lot of unnecessary complexity upon a user. Therefore, we think that a structural iterator alone cannot provide useful polytypism. Nodes are often considered to be the elements of a graph. Edges determine the structure of a graph. For example, certain configurations of edges make a graph appear like a list or a tree. However, edges can have values in a weighted graph and directions in a directed graph. Weight values can be determined by a weight function (i.e., can be represented as a separate association list). Iteration over a graph therefore means visiting nodes or edges. Iterating over the weights, on the other hand, means traversing a separate weight list. A generic map operation can be applied to a graph resulting in an isomorphic graph with different data. The operation must apply to elements. Because a graph has two kinds of elements, nodes and edges, we must specify which we want to consider for the operation. A map operation then applies a function to every iterated element and places the result in a list. When all elements have been traversed, a generator that reconstructs the structure of the original graph processes the list. We can iterate over nodes in a graph using elementary iterators such as depth first search, breath first search and weighted order. A graph then appears as a list and many list operations apply, e.g., accumulate. However, without further support, list operations, not knowing about the structure of the currently traversed data structure, give us degenerated output. For example, a map operator that maps a graph into a list is not good enough. We expect an isomorphic data structure as the outcome of map. An iterator destroys information about the structure of the graph. This information can be passed as a parameter to the algorithm. In EFLIB, we provide a generator object with every data structure. A generator object constructs what an iterator destructs: the structure of our data. For example, a generic map operation can request an isomorphic generator for the graph. The isomorphic generator rebuilds an output graph with the same structure as the input graph, the elements left alone. We believe that generators are important not just because they allow for polytypism, but that they are also building blocks that can be reused and replaced, allowing for flexibility and usability. An algorithm can be parameterized with respect to a set of generators, for example. Changing the parameters then means turning the algorithm into something very different, which is reusing the core of the algorithm.

Minimum Spanning Tree
MST problem transforms a graph into a tree. However, the standard MST algorithms are inherently sequential as Figure 12 shows. MST can be implemented using iterators and visitors because the core of the algorithm involves iterating over edges with associated weight. The visitor accumulates non-repetitive spanning edges in disjoint sets. The output is a graph reduced to only contain nodes inside the accumulated sets. Not all graph algorithms are as simple as MST because any algorithm that considers both edges and nodes, or advanced adjacency, cannot easily be sequentialized without losing important information.

Figure 12
Figure 12. Minimum spanning tree. A minimum spanning tree is easily solved using a sequential algorithm, iterating over a table of edges.

Malcolm shows how Bird's theory of lists can be generalized into any data type, giving rise to true polytypism. This is not possible with mere iterators. An iterator can traverse any data structure appearing as if it was a sequence. This generalization is not polytypic. The type is cast and thus ignored. Allowing for polytypism in an object-oriented algorithm framework requires data structures that are unified in the same interface. In functional programming, a polytypic algorithm is parameterized with respect to a constructor for a data type, e.g., a tree. The question is whether the iterator abstraction is really suitable for polytypism. We need to discover a way of rebuilding and preserving the structure of data types. Our suggestion is that the solution can be found in the decomposition of data structures into generators and lists.

We have shown that generic algorithms represented as compositions of iterators and visitors are powerful because they allow us to reuse, understand and adapt algorithms. This allows us to build frameworks with algorithms instead of conventional class libraries. One approach to generic algorithms exists in STL. STL claims to be "open and orthogonal and thus extensible" [Musser96a, pg. 46], but STL relies on templates and provides only fundamental algorithms and function objects for lists (sequences). STL lacks infrastructure and facilities for adaptation and flexibility and is therefore not a framework. We have demonstrated that generic algorithms can be implemented in an algorithm framework thus allowing for better flexibility, increased simplicity and better reuse. Much research on algorithm decomposition exists in the area of functional programming, but the ties to object-orientation are relatively unexplored. For example, Jay [Jay97a] discusses the implementation of the visitor design pattern from a functional background and briefly compares two other approaches in object-oriented programming, but the survey only considers the visitor pattern in principle. Kühl and Weihe [Kühl96a] give a much more comprehensive survey on algorithm reuse and suggest a decomposition of algorithms. Algorithms are identified as loop kernels with replaceable subcomputations. This is basically our view also. The approach we've presented includes decomposition into iterators and visitors reminiscent of subalgorithms and loop kernels in Kühl and Weihe's paper, but our approach also considers decomposition of data structures. Saying that "algorithms should be reused but not the underlying data structure" satisfies Kühl and Weihe. We have argued that truly generic algorithms require generic data structures as well, in the sense that both algorithms and data structures must be decomposed to become reusable. We showed that iterators can be regarded as separate and substitutable parts of an object-oriented generic algorithm, similar to recursion patterns in functional programming—which can be captured using polytypic language constructs, etc.—and similarly generators can be regarded as part of an object-oriented data structure. This gives rise to a model for generic algorithm programming that captures some of the ideas from functional programming (polytypism, compositions of algorithms), using object-oriented framework-like libraries of data structures and algorithms, which facilitates software reuse.

This work was financed by Stockholm University, Royal Institute of Technology, and partly by NUTEK, the Swedish National Board of Technical Research and Development. This research took place at Uppsala University in March to April 1999, and the resulting paper was submitted to JOOP in February 2000.


  1. [Aho83a] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman, Data Structures and Algorithms, Addison–Wesley, Reading, MA, 1983.
  2. [Bird87a] Richard S. Bird, "An introduction to list theory." In M. Broy, editor, Logic of Programming and Calculi of Discrete Design, volume F36 of NATO ASI Series. Springer–Verlag, 1987.
  3. [Bird96a] Richard S. Bird and Oege de Moor, Algebra of Programming, Prentice Hall International, 1996.
  4. [Bird96a] Richard S. Bird, Oege de Moor, and Paul Hoogendijk, "Generic functional programming with types and relations," Journal of Functional Programming, 6(1):1–28, January 1996.
  5. [Budd91a] Timothy A. Budd, An Introduction to Object-Oriented Programming, Addison–Wesley, 2nd edition, 1997. First edition published in 1991.
  6. [Frich94a] Arne Frick, Walter Zimmer, and Wolf Zimmermann, Karla: An extensible library of data structures and algorithms, Technical report, Universität Karlsruhe, Fakultät für Informatik, 1994.
  7. [Frick94b] Arne Frick, Walter Zimmer, and Wolf Zimmermann, On the design of reliable libraries, Technical report, Universität Karlsruhe, Fakultät für Informatik, 1994.
  8. [Gamma95a] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Addison–Wesley, Reading, MA, 1995.
  9. [Goldberg89a] Adele Goldberg and Dave Robson, Smalltalk-80: The Language, Addison–Wesley, 1989.
  10. [Hoogendijk97a] Paul Hoogendijk, A Generic Theory of Data Types, PhD thesis, Eindhoven University of Technology, 1997.
  11. [Jansson96a] Patrick Jansson and Johan Jeuring, "Polytypic programming." In L Launchbury, E. Meijer, and T. Sheard, editors, Proceedings of the Second International Summer School on Advanced Functional Programming Techniques, Lecture Notes in Computer Science, pages 68–114. Springer–Verlag, 1996.
  12. [Jay94a] C.B. Jay and J.R.B. Cockett, "Shapely types and shape polymorphism." In D. Sannella, editor, Programming Languages and Systems—ESOP '94: 5th European Symposium on Programming, Edinburgh, U.K., April 1994, Proceedings, Lecture Notes in Computer Science, pages 302–316. Springer–Verlag, 1994.
  13. [Knuth97a] D.E. Knuth, Fundamental Algorithms, Volume 1 of The Art of Computer Programming. Addison–Wesley, Reading, Mass., 3rd edition, 1997.
  14. [Kofler92a] Thomas Kofler, Robust iterators in ET++, Technical report, Ubilab, Union Bank of Switzerland, June 1992.
  15. [Kühl95b] Dietmar Kühl and Karsten Weihe, "Using design patterns for reusable, efficient implementations of graph algorithms," Konstanzer Schriften in Mathematik und Informatik, (1), January 1996.
  16. [Kühl96a] Dietmar Kühl and Karsten Weihe, "Iterators and handles for nodes and edges in graphs," Konstanzer Schriften in Mathematik und Informatik, (15), September 1996.
  17. [Larsson99a] Johan L. Larsson. Framework reuse and the foundation of EFLIB, March 1999. NUTEK report.
  18. [Larsson0a] Johan L. Larsson, "An Introduction to Customization Design Patterns in EFLIB," Journal of Object-Oriented Programming, September 1999.
  19. [Malcolm90a] Grant Malcolm, Algebraic Data Types and Program Transformation, PhD thesis, Rijksuniversiteit Groningen, 1990.
  20. [Melhorn95a] Kurt Melhorn and Stefan Näher, "A platform for combinatorial and gemoetric computing," Communications of ACM, (38):96–102, 1995.
  21. [Melhorn98a] Kurt Mehlhorn, Stefan Näher, Michael Seel, and Christian Uhrig, The LEDA User Manual Version 3.7.1, 1998. Available on the LEDA web page.
  22. [Melhorn99a] Kurt Mehlhorn and Stefan Näher, The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, 1999.
  23. [Murer96a] Stephan Murer, Stephen Omohundro, David Stoutamire, and Clemens Szyperski, "Iteration abstraction in Sather," ACM Transactions of Programming Languages and Systems, 18(1):1–15, January 1996.
  24. [Musser94a] David. R. Musser and Alexander A. Stepanov, "Algorithm-oriented generic libraries," Software Practice and Experience, 24(7):623–642, July 1994.
  25. [Musser96a] David R. Musser and Atul Saini. STL Tutorial and Reference Guide, Addison–Wesley, 1996.
  26. [Nissen96a] Marco Nissen and Karsten Weihe, "Combining leda with customizable implementations of graph algorithms." Konstanzer Schriften in Mathematik und Informatik, (17), October 1996.
  27. [Nissen97a] Marco Nissen, Graph iterators: Decoupling graph structures from algorithms, Master Thesis, Max Planck Institute of Computer Science, 1997.
  28. [Palsberg98a] J. Palsberg, C.B. Jay, and J. Noble, "Experiments with generic visitors." In R. Backhouse and T. Sheard, editors, Workshop on Generic Programming: Marstrand, Sweden, 18th June, 1998, pages 81–84, 1998.
  29. [Palsberg98a] Jens Palsberg and C. Barry Jay, "The essence of the visitor pattern." In Proceedings of COMPSAC'98, 22nd Annual International Computer Software and Applications Conference, pages 9–15, Vienna, Austria, August 1998.
  30. [Stroustrup90a] Bjarne Stroustrup and Magaret A. Ellis, The Annotated C++ Reference Manual, Addison–Wesley, 1990.
  31. [Weihe97a] Karsten Weihe, "Reuse of algorithms: Still a challenge to object-oriented programming." In Proceedings OOPSLA '97, ACM SIGPLAN Notices, pages 34–48, 1997.
  32. [Weihe98a] Karsten Weihe, "A software engineering persctive on algorithmics," Konstanzer Schriften in Mathematik und Informatik, (50), January 1998.
  33. [Weinand88a] Andre Weinand, Erich Gamma, and Rudolph Marty, "ET++—an object-oriented application framework in C++." In Norman Meyrowitz, editor, Proceedings OOPSLA '88, ACM SIGPLAN Notices, volume 23, pages 46–57, San Diego, California, November 1988.