Passing std::string by-value noticeably slows the idioms

The benchmarks comparing transform to a loop also highlighted the substantial contribution that copying values made to their performance. Copying costs become especially noticeable for complex types, such as standard strings too long to fit in the short string optimization.

In my original presentation of the transform idiom, I slipped up and used a by-value parameter to the lambda expression rather than by-reference. For the actual benchmarks, I corrected that and specified a by-reference parameter. How much of an effect would the original by-value parameter have had?

I have rerun the benchmarks with that single change: The index of the for loop is by-value and the parameter to the lambda is by-value. The exclude value remained a global, evaluated once rather than every loop iteration.

The resulting functions are (modified lines are highlighted):

str_loop_emplace: Basic loop using emplace_back()

std::size_t str_loop_emplace (plist& res) {
  assert ( ! res.size());
  for (auto p : src) {
    if (p.first != exclude)
      res.emplace_back(p.first, p.second*p.second);
  }
  return res.size();
}

str_loop_push: Basic loop using push_back()

std::size_t str_loop_push (plist& res) {
  assert ( ! res.size());
  for (auto p : src) {
    if (p.first != exclude)
      res.push_back(ppair{p.first, p.second*p.second});
  }
  return res.size();
}

str_option_emp: transform idiom using an output iterator calling emplace_back()

std::size_t str_option_emp(plist& res) {
  assert ( ! res.size());
  std::transform(src.cbegin(), src.cend(), opt_back_emplacer(res),
    [=](auto p) {
      if (p.first == exclude)
        return oppair();
      else
        return oppair(ppair{p.first, p.second*p.second});
    } );
}

str_option: transform idiom using an output iterator calling push_back()

std::size_t str_option(plist& res) {
  assert ( ! res.size());
  std::transform(src.cbegin(), src.cend(), opt_back_inserter(res),
    [=](auto p) {
      if (p.first == exclude)
        return oppair();
      else
        return oppair(ppair{p.first, p.second*p.second});
    } );
}

Results

All other conditions of the benchmarks were the same as the previous post.

Overview

First, a general comparison between the results for by-reference (“Ref”) and by-value (“Copy”):

overview_by_ref

The by-value construct essentially doubles the time for the complex, long string, increases it for optimized, short strings, and has little to no effect for char const * strings. Bear in mind that this change adds an extra copy for every value, not just the ones that will be appended to the result vector.

Long strings

The left subplot below is the (by-reference) long string plot from the last post, rescaled to match the plot for the by-value version on the right:

long_by_ref

The means have all been roughly doubled. Long strings are so complex that adding another copy dominates the time.

Short strings

Again, the left subplot below is the (by-reference) short string plot from the last post, rescaled to match the plot for the by-value version on the right:

short_by_ref

For short strings, the means are mostly about 50% higher, though the mean for 0% copied is around three times higher. The extra copy has a proportionately higher effect on the 0% data set because its entire time is due to the loop processing time, which is increased by the extra copy.

Unique strings

Repeating the above structure for unique char const * strings:

unique_by_ref

Unlike standard strings, strings uniquely identified by their pointer show no effect from an extra copy. Indeed, for this type, by-reference is arguably slower than by-value, as the by-reference value requires an indirect load. In the optimized code, however, the lambda function is inlined and the two cases produce identical object code for char const *.

Conclusion

Passing complex types by reference is considerably faster than passing by value. The semantics require an extra copy for by-value passing; it will not be eliminated by inlining the function.

This isn’t news. It is Recommendation F.16 of the C++ Core Guidelines. The contribution of this small post is merely to relate the cost of by-value to the relative costs of the idioms. For std::string (as implemented by libstdc++ 6.2), on average you save twice as much by passing by-reference (a matter of adding a single &) than by choosing the loop over the transform idiom.