Lambdas: Sweet, sweet syntactic sugar

23 Feb 2017 Tags: C++

Update: The next post provides recipes for common use cases of lambdas. If parts of this post seem too abstract, you might find the examples in the next post clarify things.

STL iterators and algorithms really can’t be discussed without also talking about lambdas. Although the STL design predates the introduction of lambdas by almost two decades, the introduction of lambdas in C++ 2011 improved the usability of STL algorithms substantially. I don’t have statistics but I presume that the STL algorithms now receive substantially more use due to the greater simplicity of passing callbacks as lambdas rather than instances of named classes. Lambdas are purely a matter of convenience—they introduce nothing to the language that you couldn’t do before—but by making common use cases more convenient, they effectively extend the range of use cases where calling a standard algorithm makes sense.

Programmers accustomed to languages like Python, JavaScript, or Scheme might counter that C++ lambdas retain an awkward complexity compared to the corresponding facilities in the other languages. What’s up with the by-value and by-reference free variables, and the mutable keyword? Scheme gets on fine without those. Ah, [and here the Python/JavaScript/Scheme … programmer smiles good-naturedly at their benighted C++-using colleague] it’s just that C++ programmers seem to revel in a foolish complexity. But [shrug] what can you do?

I’m not ready—yet—to defend or critique the concrete syntax of C++ lambdas but I think we need to understand that a C++ lambda has been delegated a greater responsibility by the core language than a Scheme lambda. Where lambdas in Python, JavaScript and Scheme all define anonymous functions, lambdas in C++ define an instance of an anonymous callable class. The same issues must arise in Java and C#, though I am unfamiliar with the specifics of their solutions. Analyzing the design of C++ lambdas will help us understand a deep distinction between it (and its object-oriented cousins Java and C#) and languages like Python, JavaScript, and Scheme.

Who builds the closure: The language or a class constructor?

Consider the Python lambda expression

lambda i : i * c

Variable i is a parameter, while c is a free variable. Whenever a language permits lambdas—indeed, whenever a language permits defining functions of any sort inside another function—its designers have to make two decisions regarding free variables:

What is their scope?
How are they bound to values at runtime?

Scheme was one of the first languages to recognize the depth of these questions and to answer them in a coordinated, elegant way. In fact, the designers consider these choices to be so foundational to the language that the first substantive paragraph of the language’s defining document reads, in its entirety:

Following Algol, Scheme is a statically scoped programming language. Each use of a variable is associated with a lexically apparent binding of that variable.

Python adopts the same rules, so I’ll illustrate with Python code:

def foo(c):
    l = lambda i : i * c
    return l

c = 1
a = range(4)
res = map(foo(2), a)

print (", ".join(map(str, res)))

What value will this print? It depends upon whether the free c in the lambda expression binds to the parameter c (line 1), whose value is 2, or the global c (line 5), whose value is 1. Python uses Scheme’s notion of “lexically apparent binding”, binding the free c to the parameter c and printing 0, 2, 4, 6.

This choice makes higher-order functions more useful but it complicates the language runtime. In particular, local variables can no longer be stored on a stack and require more elaborate storage management.

If your spidey-sense tingled at the phrase, “more elaborate storage management”, you have a strong intuition for C++. The language’s designers are unlikely to adopt a semantics that presumes “elaborate storage management”. But the choice is much broader than just lambda semantics; it is a choice of whether the binding of names to values is handled by the language or the programmer.

(Aside: A couple of notes about Python: First, Python lambdas are deliberately restricted compared to those of every other language I describe here. Those restrictions don’t limit them in the context here of list comprehensions and higher-order list functions such as map. Second, Python’s name-value binding is a hybrid of object-oriented and functional. Scheme represents a purer exemplar of what I describe but fewer people know that language. Again, within the context of list comprehensions, Python’s free variable bindings exemplify the differences I describe here, even if the overall language does not.)

In programming language theory, the data structure binding names to values is called an environment. In Scheme, Haskell, and other languages with a strong functional programming basis, the environment is managed by the language and programmers never explicitly work with them. Environments have no names and you cannot manipulate them directly. But the programmer has to understand them enough to make sense of free variable bindings.

When a function result is a function value that includes free variables, the result refers to an environment that defines the values bound to those free variables. The environment (at least the part representing bindings of the free variables) must be retained at least as long as the function value exists. The (function value, environment) pair is a closure. A simple stack regime is insufficient to represent closures because the environment bindings in the closure have to persist even after the function that created those bindings has completed and returned to its caller.

Unlike languages with a strong functional basis, C++ reifies its environments as member variables of an object. Rather than defining a function and using a closure to complete that definition via an environment binding the free variables, C++ uses callable objects. Where a Python lambda binds its free variables to the lexically nearest variables, carrying the resulting environment in a closure, a C++ lambda is an object with member variables providing the values of all free variables in the lambda expression. This fits more naturally within an object-oriented language: Rather than invisible environments whose presence is only indicated by the meaning they provide a function value, we have object instances like any other.

A C++ lambda expression defines a new, unnamed class and instantiates a single object of that class. The class has a single member function, the function body of the lambda. The prefix to the lambda defines the member variables of the class. This is the source of the extra complexity in C++ lambda syntax compared to Python/JavaScript/Scheme: In addition to defining a function, the C++ lambda has to define the member variables providing values for any free variables in the function body and specify how they are to be initialized.

Defining a lambda object in C++

Here’s how the Python functions defined above could be defined in C++:

auto foo (int c) {
  return [c] (auto i) { return i * c; };
}

int main () {
  auto c = 1;
  vector<int> a {1, 2, 3, 4};
  vector<int> res {};
  transform (a.cbegin(), a.cend(), back_inserter(res), foo(2));

  copy (res.cbegin(), res.cend(), make_ostream_joiner(cout, ", "));
  cout << '\n';
}

(The ostream_joiner in Line 11 is a C++ 2017 feature that is a close analogue of Python’s str.join() function.)

Let’s highlight the lambda in Line 2:

[c] (auto i) { return i * c; }

Where the Python lambda uses the rules of Python environments to bind the free c to the parameter c, the C++ lambda prefix [c] specifies that the callable object will have a member variable c with the following properties:

c in the lambda will be copy-constructed (the default) from the lexically closest c, namely the function parameter, taking whatever value the parameter has at the time the lambda is built.
c will be const (the default).
The free variable c in the function body refers to the member c.

Given that the sole purpose of the lambda object’s member variables is to provide bindings for free variables in the function body, the lambda syntax only supports a subset of member variable definitions most useful for this purpose. You can specify that the member variables are either copy-initialized or by-reference. By default, all copy-initialized values are const, but if any need to be updated, declaring the lambda mutable makes them all mutable. Member variables that are references are always mutable as long as the values they reference are.

A lambda object whose member variables are all copy-constructed is self-contained and can have arbitrary lifetime; it can persist long after the original sources of its member variables have been deallocated. Lambda objects with member reference variables can only have lifetime within the lifetimes of all of the variables that it refers to. Consult the reference for lambda expressions for the details.

Conclusion

The greater complexity of C++ lambda syntax is due to its more complicated semantics. More than defining a function, you are defining a class and the initialization of its member variables. The concrete syntax for lambdas aims to provide a concise way to specify the kinds of classes most likely to be used to construct anonymous, callable objects. Understanding that it’s simply an alternate style of class definition gives it all more sense.

All my marbles in one place

Lambdas: Sweet, sweet syntactic sugar

Who builds the closure: The language or a class constructor?

Defining a lambda object in C++

Conclusion