Iterators: One success story

Iterators intrigue me because they seem to bring to C++ facilities that I’ve found congenial in other languages. I’ve spent some time in Haskell and a lot of time in Python, which both support list comprehensions. I’ve also enjoyed the limited amount of C# Language-Integrated Query (LINQ) programming that I’ve done.

Yet when I come to actually use the STL iterators, it feels awkward and requires much more text than achieving comparable results using list comprehensions or LINQ. Some of that may be simple unfamiliarity but that is itself a problem: A good design will allow me to apply patterns I’ve learned in other languages.

But it’s more than unfamiliarity. I’ve practiced implementing functions in the C++ STL and found that simple Python expressions such as

[x*x for x in a if x > 0]

have to be turned inside-out and sprawl across multiple lines when written in the STL. (I’ll give details in a later post.) The STL appears essentially much more verbose.

I think it’s also significant that, despite the STL’s design being 20 years old, no other mainstream language has adopted it. Deep down, Python’s list comprehensions are in fact constructed from similar mechanisms, iterator types and generator expressions, but the underlying memory management is completely different.

So I want to spend some posts exploring the strengths and weaknesses of STL iterators. And I want to begin with one of its great successes, a success that became available with the range-based for feature of C++ 2011.

A case where iterators work well

Consider the following C program:

#define LEN 3

int main () {
  int a [LEN] = {4, 5, 6};
  int sum  = 0;
  for (int* p = a; p != a+LEN; p++)
    sum += *p;

  return sum;
}

The loop simply walks a pointer p over the length of array a, summing the values in sum. (Note: The above code is so simple that the optimizer does something much simpler. The Appendix to this post shows the actual code necessary to fake out the optimizer and get the compiler to generate the code I’m describing. I’m displaying the simpler code here because it presents the essential loop.) The resulting loop code (from gcc 6.2) is as simple as it gets:

  xorl	%eax, %eax
L3:
  addl	(%edx), %eax
  addl	$4, %edx
  cmpl	%esi, %edx
  jne	.L3

Register edx is pointer p, incrementing through the array. Register eax is sum, while register esi contains the address a+LEN. This is the most efficient scalar code possible for the loop (though it could likely be improved by using the x86 vector instructions or your processor’s equivalent).

Here is the same algorithm using the C++ STL library and a range-based for:

#include <array>

using std::array;

int main () {
  array<int,3> a {4, 5, 6};

  int sum = 0;
  for (auto v : a)
    sum += v;

  return sum;
}

This version generates equivalent loop code (though with different register assignments) as the C program, despite its more abstract presentation. In fact, C++ will generate equivalent code to C’s pointer-based program for a loop over a std::vector as well:

#include <vector>

using std::vector;

int main () {
  vector<int> a {4, 5, 6};

  int sum = 0;
  for (auto v : a)
    sum += v;

  return sum;
}

Where the std::array collection is just mild syntactic sugar over a foundational C/C++ array, a std::vector is a more powerful collection, expanding in size as new elements are appended. Yet a loop over a vector generates just as efficient code as the far more explicit, pointer-oriented code in C.

Iterators link the STL collections and ranged for (and all the algorithms in the algorithms library). A collection provides a suite of iterators according to a standardized interface and the ranged for simply calls those iterators. Template specialization allows the library to specify efficient, pointer-based iteration over collections for which that makes sense.

This is a real benefit of STL iterators: We specify an abstract interface to every STL collection and for those collections represented by a contiguous block of elements, such as std::array or std::vector, we get code every bit as efficient as the best that we can get from any language. For collections based on non-contiguous representations, such as std::map or std::unordered_map, the same ranged-for can be used but will generate the more complex traversal code required for a binary tree or hash table, respectively.

Once we have iterators, a tweak to the concrete syntax of the core language provided a concise form for iterating over a collection, the ranged for. This loop is more robust than the C for loop, as the start and end points of the loop are generated by the collection. By contrast, the C loop (and the pre-2011 for loop in C++) requires us to correctly specify the end point of the array, with no provision for the compiler to cross-check.

This is an unequivocal win for the iterator design. In future posts, I’ll consider functional requirements where it becomes far less successful.

Appendix

The C program given above in fact generates the following object code:

  .cfi_startproc
  movl	$15, %eax
  ret
  .cfi_endproc

The smarty-pants optimizer has recognized that the loop is completely specified and has precomputed its result (the value 15) at compile-time. To generate an actual runtime loop, we have to make the array larger and read it from input:

#include <stdio.h>

#define LEN 10

int main () {
  int a [LEN];

  int res;
  for (int i = 0; i < LEN; i++)
    res = scanf ("%d", &a[i]);

  int sum  = 0;
  for (int* p = a; p != a+LEN; p++)
    sum += *p;

  return sum;
}

The resulting loop generates the code given in the body of this post.

A similar problem arises with the C++ code. Here is the version that prevents the optimizer from eliminating the loop:

#include <array>
#include <iostream>

using std::array;
using std::cin;

int main () {
  array<int,10> a {};
  for (auto& v : a)
    cin >> v;

  int sum = 0;
  for (auto v : a)
    sum += v;

  return sum;
}