Wednesday, January 11, 2017

C++ tips, 2017 Week 1 (2-Jan - 8-Jan-2017)

This is part of my weekly C++ posts based on the daily C++ tips I make at my work. I strongly recommend this practice. If you dont have it in your company start it. 
List of all weekly posts can be found here.


1. No raw  loops

If you have watched Sean Parent's great talk C++ seasoning you know the goal No raw loop (if you haven't - go and watch it). In short - always try to not use raw loops.

The rationale behind this is simple. When you are using raw loops you are usually iterating over something. Most likely it will be a container. Most likely it will be a std or boost container in which case you are going to use iterators. And since you are doing something while iterating over the container with those iterators it is most likely something already implemented in the <algorithm> library - either being finding, for_each-ing, merging, copying, sorting, filtering, etc.



The benefit of using STL algorithms instead of raw loops is that you write your intend better. You write more readable code. On top of that you don't have to worry about corner cases and scalability because the STL algorithms are written with that in mind. On top of that the compiler is probably aware of what those algorithms are supposed to do and is able to do proper optimizations (Note: this is an assumption). The same goes for the static analyzers.

Even if you write performance critical code it is always better to write it using the STL algorithms the first time. This way you achieve several things - write your intend better so reviewers are able to comprehend it easier, create the base case about which you will be able to write your tests and also create the base measurements to which you are going to compare your optimizations.

Of course it is not always applicable and that's why it is a goal not a rule.

2. std::for_each vs for

In the spirit of the previous tip lets compare std::for_each with for. Should you write:

for (auto& elem : vect) { 
elem += 5; 
}

or

std::for_each(std::begin(vect), std::end(vect), [](auto& elem){ 
elem += 5; 
});

or why not use std::transform:

std::transform(std::begin(vect), std::end(vect), std::begin(vect), [](auto& elem){ 
return elem + 5; 
});

Which one will be most efficient?

Compiler explorer to the rescue! Not that surprisingly in all three cases clang (here)  and gcc (here) create identical assembly (the assembly windows are the same - scroll one of them to compare the different functions). Same goes for icc and MSVC. Clang even uses vectorization in this very simple example. I've made another example with a simple struct, virtual function call and a member function call in which we output to std::cout  - here - and the result is the same - identical assembly for those three functions (minus the vectorization and few additional instructions for the std::transform case).

My guess (the compiler writers probably know better) is that the compilers are aware of what the <algorithm>s are supposed to do and they use that knowledge to optimize. Readability leading to better optimizations.

This is not the case for our custom raw loops that even future-us will have troubles comprehending.

Note: This has to be confirmed from the right people. It appears to be this way but don't take anything for granted. Never guess about performance.

Back to the question: Which one should be used?

I'm slightly inclined to use std::for_each when I am obviously for_each-ing. Besides - from C++17 I will be able to parallelize it where it is suitable by just adding execution policy.

3. Guaranteed copy elision

From C++17 we have guaranteed copy elision in some cases - that is important its not always only in some specific cases. Copy elision means that the compiler is allowed to omit constructing a object and than copying to somewhere (think return by value for example) but instead to construct the object directly in that somewhere. Most common cases are Return Value Optimization (RVO) and Named Returned Value Optimization (NRVO). I suggest this extremely detailed article by Jonas Devlieghere about copy elision.

What the proposal does is to tweak the definition of the value categories (glvalue, prvalue) in a way that it is easy to distinguish when an evaluation (object construction for example) computes a location and when it is only initialization. I'm simplifying it a little here.

This tweak enables to more clearly specify the second case - when we initialize something without specifying where it should be located. This is where the copy elision is guaranteed.

Thus we will be able to return non-movable and non-copyable temporaries by value without getting compiler errors. For example returning a temp object by a class with deleted copy and (because of that) no move operations:

struct Foo {  
Foo() = default; 
Foo(const Foo &) = delete; 
}; 
Foo f() {   
return Foo(); 
} 
int main() {   
Foo foo = f(); 
}

Compiles under C++17. Because calling that constructor Foo() constructs the object foo directly.

Very important note. Guaranteed copy elision does not work for named variables. Nothing is guaranteed here.

Foo f() {   
  Foo fooLocal;
         ......
  return fooLocal; 
}

The authors of the proposal felt that it will be way to difficulty to make it clear when and what should be guaranteed when named variables are involved so they decided to go with what can be done clearly instead of thinking over it until C++20.

4.  Extern template

Extern templates (not to be mistaken with exported templates) is a way to save the compiler and the linker some work. If you know that you instantiate a template in one translation unit but also use it and inevitably instantiate it again in another you can write:
extern template class MyVector<int>;
thus telling the compiler that there is already instantiation of MyVector<int> somewhere.


5. boost::flat_(multi)map/set

The interface and behavior of the standard associative containers is described in the Standard in a way that the underlying implementation can only be hash table and binary tree. As we know with generalization come inefficiencies and with specialization and trade-offs come optimizations on a chosen metric.

Sometimes we want extremely fast lookup but we don't care about insertion times. This is where boost flat associative containers come handy. They have similar functionality to the standard associative containers but the underlying implementation is vector-like container.

The trade-off here is getting data locality, faster lookup, less memory consumption and faster iteration (random-access iterators) for much slower insertion and erasure (elements of the vector have to be rearranged, yes - move semantics and stuff - but still), loosing stable iterators, not be able to store non-movable and non-copyable objects and weakening the exception safety (we are actually moving/copying objects as opposed to changing pointers to and from nodes containing objects).