Passing std::string by-value noticeably slows the idioms
22 May 2017 Tags: C++The benchmarks comparing transform
to a
loop also highlighted the substantial contribution that copying values made to their performance. Copying costs become especially noticeable for complex types, such as standard strings too long to fit in the short string optimization.
In my original presentation of the transform
idiom, I slipped up and used a by-value parameter to the lambda expression rather than by-reference. For the actual benchmarks, I corrected that and specified a by-reference parameter. How much of an effect would the original by-value parameter have had?
I have rerun the benchmarks with that single change: The index of the for
loop is by-value and the parameter to the lambda is by-value. The exclude
value remained a global, evaluated once rather than every loop iteration.
The resulting functions are (modified lines are highlighted):
str_loop_emplace
: Basic loop using emplace_back()
std::size_t str_loop_emplace (plist& res) {
assert ( ! res.size());
for (auto p : src) {
if (p.first != exclude)
res.emplace_back(p.first, p.second*p.second);
}
return res.size();
}
str_loop_push
: Basic loop using push_back()
std::size_t str_loop_push (plist& res) {
assert ( ! res.size());
for (auto p : src) {
if (p.first != exclude)
res.push_back(ppair{p.first, p.second*p.second});
}
return res.size();
}
str_option_emp
: transform
idiom using an output iterator calling emplace_back()
std::size_t str_option_emp(plist& res) {
assert ( ! res.size());
std::transform(src.cbegin(), src.cend(), opt_back_emplacer(res),
[=](auto p) {
if (p.first == exclude)
return oppair();
else
return oppair(ppair{p.first, p.second*p.second});
} );
}
str_option
: transform
idiom using an output iterator calling push_back()
std::size_t str_option(plist& res) {
assert ( ! res.size());
std::transform(src.cbegin(), src.cend(), opt_back_inserter(res),
[=](auto p) {
if (p.first == exclude)
return oppair();
else
return oppair(ppair{p.first, p.second*p.second});
} );
}
Results
All other conditions of the benchmarks were the same as the previous post.
Overview
First, a general comparison between the results for by-reference (“Ref”) and by-value (“Copy”):
The by-value construct essentially doubles the time for the complex, long string, increases it for optimized, short strings, and has little to no effect for char const *
strings. Bear in mind that this change adds an extra copy for every value, not just the ones that will be appended to the result vector.
Long strings
The left subplot below is the (by-reference) long string plot from the last post, rescaled to match the plot for the by-value version on the right:
The means have all been roughly doubled. Long strings are so complex that adding another copy dominates the time.
Short strings
Again, the left subplot below is the (by-reference) short string plot from the last post, rescaled to match the plot for the by-value version on the right:
For short strings, the means are mostly about 50% higher, though the mean for 0% copied is around three times higher. The extra copy has a proportionately higher effect on the 0% data set because its entire time is due to the loop processing time, which is increased by the extra copy.
Unique strings
Repeating the above structure for unique char const *
strings:
Unlike standard strings, strings uniquely identified by their pointer show no effect from an extra copy. Indeed, for this type, by-reference is arguably slower than by-value, as the by-reference value requires an indirect load. In the optimized code, however, the lambda function is inlined and the two cases produce identical object code for char const *
.
Conclusion
Passing complex types by reference is considerably faster than passing by value. The semantics require an extra copy for by-value passing; it will not be eliminated by inlining the function.
This isn’t news. It is Recommendation F.16 of the C++ Core Guidelines. The contribution of this small post is merely to relate the cost of by-value to the relative costs of the idioms. For std::string
(as implemented by libstdc++
6.2), on average you save twice as much by passing by-reference (a matter of adding a single &
) than by choosing the loop over the transform
idiom.