A single STL statement equivalent to the basic Python list comprehension

In the last post, I described the structure of the basic Python list comprehension

[expression1(var) for var in expression2 if condition(var)]

and lamented that all the straightforward translations to the STL algorithms required two statements, one for the filter and one for the computation.

These “straightforward” translations use the std::transform() algorithm, which seemingly enforces a one-to-one mapping from source to sink. But this one-to-one mapping is not really enforced by the algorithm but instead by the insertion iterator for the sink. If the inserter would only output items that passed the filter, the basic Python list comprehension could be accomplished in a single STL transform(). To produce this, the filter expression, a C++ lambda, must communicate to the insertion iterator whether a value has passed the filter. The iterator would only insert passed values.

We can use the C++ 17 std::optional type (available in earlier versions of the language via boost::optional) as the link between the filter and iterator. In addition, we need a version of back_insert_iterator() that accepts an optional value and only performs a push_back() when its argument contains a value. I built one starting from Andre Tomazo’s back_emplace_iterator() (code provided in the Appendix).

With these tools, the list comprehension from the last post

def func(lst):
    a = [(x[0], x[1]*x[1]) for x in lst if x[0] != 'US']
    return sorted(a)

if __name__ == '__main__':
    a = func(zip(['BC','NB','BC','US','NB','BC','BC'],range(7)))
    print (a)

can be written in the STL as

using ppair = std::pair<string,int>;
using oppair = optional<ppair>;
using plist = std::vector<ppair>;

ostream& operator<<(ostream& os, const ppair& pp) {
  os << "(" << pp.first << ", " << pp.second << ")";
  return os;
}

ostream& operator<<(ostream& os, const plist& pl) {
  if ( ! pl.empty()) {
    cout << pl[0];
    for (auto p = pl.cbegin() + 1; p != pl.cend(); ++p)
      cout << ", " << *p;
  }
  return os;
}

int main () {
  plist src {
    ppair{"BC", 0},
    ppair{"NB", 1},
    ppair{"BC", 2},
    ppair{"US", 3},
    ppair{"NB", 4},
    ppair{"BC", 5},
    ppair{"BC", 6}
  };

  plist res {};
  res.reserve(src.size());
  std::transform(src.cbegin(), src.cend(), opt_back_inserter(res),
    [](auto p) {
      if (p.first == string("US"))
        return oppair();
      else
        return oppair(ppair{p.first, p.second*p.second});
    } );

  std::sort(res.begin(), res.end());

  cout << res << '\n';
}

The second line defines a convenient alias for optional, oppair. The lines corresponding to the list comprehension, Lines 30–40, no longer have a std::copy_if() statement but only a std::transform(). The sink for the transform is now an opt_back_insert_iterator and the filter lambda returns an oppair that indicates whether the value passed the filter.

Once a programmer has acquired the transform/opt_back_insert_iterator idiom, a given use only requires them to derive the lambda combining the filter and and computation. The lambda is essentially the Python list comprehension annotated with types.

This idiom allows you to write C++ expressions equivalent to basic list comprehensions with about the same complexity. There is a lot of boilerplate the in C++ code—the definition of opt_back_insert_iterator, the call to res.reserve(), and the elaborate std::transform() call—that is automatically provided by Python, but that code is the same for any use. All the customization is contained in the lambda. The programmer does have to choose between the various approaches to building the result vector, though the reserve/insert-back one used here is a good default.

The method has introduced many abstractions: the algorithm, the lambda, the optional type, the inserter template. Do these bloat the code or can the compiler reduce them to their basic constructs? And how does the generated code compare with the code generated from a basic for loop? It depends upon the values we are passing through the expression. I will explore the generated machine code in the next post.

Appendix: Full code

The full code for the above program, including the definition of opt_back_insert_iterator(). Take care when using this code with Standard Library implementations, such as libstdc++ 6.2, that do not support the has_type() member of optional. In that case, the operator=() member function template simply requires a type that supports both a boolean conversion and a value() member. This may include types that are not implementations of some optional type.

/*
  Example of passing optional values to OutputIterator 
 */

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
#include <tuple>
#include <vector>

#include <experimental/optional>

using std::cout;
using std::ostream;
using std::string;

using std::experimental::optional;

/*
  Class opt_back_insert_iterator derived from Andre Tomazos's
  back_emplace_iterator():
  http://stackoverflow.com/questions/18728257/back-emplacer-implementation-default-operator-vs-universal-reference-version

  with following changes:
  1. operator=() expects an optional<> type and uses push_back() only when
     argument contains an actual value.
  2. Deprecated (as of C++ 17) std::iterator derivation replaced by
     explicit using declarations.
  3. Typedefs converted to using declarations.
  4. "class" in template parameters replaced with "typename"
  5. Formatting slightly modified.

  libstdc++ 6.2 implementation of optional<> does not have "has_value()" member.
  When full C++17 implementation is available, undefine
  OPT_BACK_INSERT_NO_HAS_VALUE.
 */

#define OPT_BACK_INSERT_NO_HAS_VALUE

#ifdef OPT_BACK_INSERT_NO_HAS_VALUE
#define HAS_VALUE(t) (bool(t))
#else
#define HAS_VALUE(t) (t.has_value())
#endif

template<typename Container>
class opt_back_insert_iterator
{
protected:
  Container* container;

public:
  using iterator_category = std::output_iterator_tag;
  using value_type = void;
  using reference = void;
  using pointer = void;
  using difference_type = void;

  using container_type = Container;

  explicit opt_back_insert_iterator(Container& x) : container(&x) {}

  template<typename T>
  using _not_self =
    typename std::enable_if<
      !std::is_same<
        typename std::decay<T>::type,
        opt_back_insert_iterator
       >::value
    >::type;

  template<typename T, typename = _not_self<T>>
    opt_back_insert_iterator<Container>&
    operator=(T&& t)
    {
      if (HAS_VALUE(t))
        container->push_back(std::forward<T>(t).value());
      return *this;
    }
    // ========================================

    opt_back_insert_iterator& operator*() { return *this; }
    opt_back_insert_iterator& operator++() { return *this; }
    opt_back_insert_iterator& operator++(int) { return *this; }
};

template<typename Container>
inline opt_back_insert_iterator<Container>
opt_back_inserter( Container& c )
{
    return opt_back_insert_iterator<Container>(c);
}

using ppair = std::pair<string,int>;
using oppair = optional<ppair>;
using plist = std::vector<ppair>;

ostream& operator<<(ostream& os, const ppair& pp) {
  os << "(" << pp.first << ", " << pp.second << ")";
  return os;
}

ostream& operator<<(ostream& os, const plist& pl) {
  if ( ! pl.empty()) {
    cout << pl[0];
    for (auto p = pl.cbegin() + 1; p != pl.cend(); ++p)
      cout << ", " << *p;
  }
  return os;
}

int main () {
  plist src {
    ppair{"BC", 0},
    ppair{"NB", 1},
    ppair{"BC", 2},
    ppair{"US", 3},
    ppair{"NB", 4},
    ppair{"BC", 5},
    ppair{"BC", 6}
  };

  plist res {};
  res.reserve(src.size());
  std::transform(src.cbegin(), src.cend(), opt_back_inserter(res),
    [](auto p) {
      if (p.first == string("US"))
        return oppair();
      else
        return oppair(ppair{p.first, p.second*p.second});
    } );

  std::sort(res.begin(), res.end());

  cout << res << '\n';
}