A simpler, fourth idiom for appending

This morning, I realized that there is a fourth approach to appending items to a vector that combines elements of the first and third. I’m embarrassed to have missed this one because it is a longstanding, well-understood approach that’s particularly applicable to the kinds of “list comprehension algorithms” I’m considering in this series.

Reviewing the output of the first method, a large proportion of the extra effort is due to the incremental allocation of the vector’s data member, where the values are actually stored. In some use cases, this is the best you can do because you do not know in advance how large the result is going to be. However, in the case considered in our examples, we know exactly how many elements the result will have: exactly as many elements as the source vector. In such cases, we can pre-allocate a storage block of the right size and then the back insertions simply consist of moving the values into that allocated space, with no reallocations to expand the vector. This was a crucial part of the efficiency of the third idiom, using emplace_back().

This approach simply adds one line to the first idiom:

Coll c4 {};
c4.reserve(init.size());
transform (init.cbegin(), init.cend(), back_inserter(c4),
           [] (auto i) { return C(i, i); });

This code performs considerably fewer operations than the first idiom:

New collection c4
[Vector allocates space for at least 5 values, c4[0] to c4[4]]

// Append C(0, 0)
Basic constructor for T0
Move constructor for c4[0] from T0
Destructor for T0

// Append C(1, 1)
Basic constructor for T0
Move constructor for c4[1] from T0
Destructor for T0

// Append C(2, 2)
Basic constructor for T0
Move constructor for c4[2] from T0
Destructor for T0

// Append C(3, 3)
Basic constructor for T0
Move constructor for c4[3] from T0
Destructor for T0

// Append C(4, 4)
Basic constructor for T0
Move constructor for c4[4] from T0
Destructor for T0

We have completely eliminated the allocate/copy/deallocate sequences. We always have to construct the value, so the only extra work performed relative to the third, emplace_back()-based, idiom is the move-assign from the temporary value and its destruction. Both of these are likely to be fast. The move will be fast for objects that are small or implemented as small handles to a larger heap data object, though it will be slower for objects that contain a large amount of data directly. The destructor will typically be fast or even nonexistent, a simple pop of the stack pointer.

This method also has the substantial advantage of neatly fitting as a sink for a standard algorithm, unlike the third idiom, which was incompatible with the standard algorithms and required reconstructing them from for loops and basic logic. For the case where the exact size of the result is known in advance, this idiom is likely preferable to the emplace_back() approach.

Applying this pattern for filters

Unlike the std::transform algorithm in the above example, there are algorithms where we do not know the number of results in advance. Consider std::copy_if(). We know the maximum number of results, which can never be larger than the source, but the actual size is determined by the number of elements for which the filter returns true. If we’re willing to over-reserve space, we can just use the above approach and accept that capacity may exceed size, perhaps significantly. This problem also exists for the emplace_back() approach, whose efficiency also depended upon pre-reservation.

C++11 added the std::vector::shrink_to_fit() function, which potentially eliminates the extra space after the copy_if() completes, but the standard gives implementers wide latitude: The function could do nothing at all, shrink the vector’s data in place, or allocate-and-copy into a smaller region. Whether it provides a real performance improvement will be specific to your application and library.

Reconsidering my critique of the STL design in light of this idiom

I ended the last post by carping on the decision load imposed by the STL design on programmers, who have to choose amongst multiple idioms, each of which exceeds the others by at least one criterion. The addition of this idiom simplifies the choice substantially. For many, many use cases, I would choose this idiom.

But not for all cases. Large PODType objects are best handled via emplace_back(), which never copies or moves the value. Objects whose constructors and destructors perform expensive resource reservation / release pairs are also be better-suited to emplace_back()—though I cannot think of many actual use cases where large numbers of such objects are going to be stored in a vector.

In short, the pre-reserve/back_insert approach is appropriate for a wide range of use cases, making it a useful default idiom. Programmers must still take care to avoid it in those cases where it will be substantially more inefficient than the emplace_back() idiom.

The larger point remains, although its severity is tempered: By exposing so many decisions to the programmer, the C++ language and its Standard Library increase the effort required to solve common tasks. Whether the potential gains in efficiency and robustness are worth the extra effort remains an open question.