Tuesday, 12 November 2013

Understanding Debug Iterators in VC++ 2012

In debug mode Visual C++ STL iterators can provide useful information at run-time via failed assertions, but there’s a performance cost and in certain scenarios they can seem to grind your program to a halt. This post shows why that happens, and how to fix it. Consider the following code:

vector<int> myInts;
myInts.push_bach(1);
cout << myInts[4] << endl;

It will compile O.K., and when done so in release mode the resultant program will happily reach into memory beyond the myInts vector and send whatever garbage it finds there to cout. However, in debug mode stepping over the third line produces the popup “Debug Assertion Failed… vector subscript out of range”. That may not be a big deal in the above case but if the index were calculated in a loop dependent on a number of variables that information could be very useful.

For another example consider what’s wrong with now adding the following lines:

auto it = myInts.begin();
myInts.push_back(2);
cout << *it << endl;

Chances are this code doesn't do what you want it to because pushing the 2 onto the vector probably requires it to grow, which requires it to move, which leaves the iterator it pointing at unallocated memory. Again, in release mode cout will get whatever it is now pointing at, but in debug mode we get “vector iterator not dereferencable”. While dereferencable may not be a real word you’ll soon figure out what it means and it will probably save some debugging time.

However, these extra checks come at a cost. Pretty much any call to an iterator involves a lot of run-time checks. For example dereferencing an iterator involves checking that the parent container isn’t null, checking the iterator’s internal pointer isn’t null, and checking that the iterator doesn’t point to memory before the parent vector’s first iterator, or after its last iterator. The parent container also has extra concerns when debug iterators are being used. All iterators that point into one container form a singly linked list. This allows the parent container to communicate with those iterators, but walking the list when it needs to is slow.

Example of slow iterators in debug mode

As a simple example, consider a list of 10,000 numbers from 1-100. I wish to find the number of pairs within that list that sum to 100. Here’s some code to achieve that:

auto noCombosSumTo100 = 0;
for (auto it1 = begin(myNums); it1 != end(myNums); it1++) {
  for (auto it2 = it1; it2 != end(myNums); it2++) {
    if (*it1 + *it2 == 100) {
      noCombosSumTo100++;
    }
  }
}

On my machine this takes about 34 milli-seconds fully optimised in release mode, and about 75 seconds in debug. That's about 2200x difference between debug and release!

Obviously this difference can't be entirely attributed to the iterators, there are lots of other opportunities for the compiler to optimise the above code in release mode. However, if I implement the same algorithm using vector indexing, like this:

for (auto index1 = 0; index1 < numTerms; index1++) {
  for (auto index2 = index1; index2 < numTerms; index2++) {
    if (mynums[index1] + mynums[index2] == 100) {
      noCombosSumTo100++;
    }
  }
}

it runs in about 4.5 seconds, so there's clearly a big cost to using debug iterators.

Ways to improve iterator performance

The improvements I'm about to show for the first code snippet could apply to any code, but it's worth understanding that any inefficiencies are exacerbated when debug iterators are involved. To improve the first code snippet while still getting the benefit of debug iterators I can:

These three modifications bring the execution time down to 21 seconds. Interestingly none of them make any difference in the second snippet where built in types are involved. Again, it's worth understanding that these improvements would apply to any user defined type (in debug mode), but it probably wouldn't be just as noticeable as when iterators are involved.

If you want a further speedup then you'll need to sacrifice some additional run-time checks by setting _ITERATOR_DEBUG_LEVEL=1 or 0 (This macro supersedes and combines the functionality of the older _SECURE_SCL and _HAS_ITERATOR_DEBUGGING macros). Doing so decreases the amount of debug help you get, but it brings the original example's execution time down to about 8.5 seconds.

At 8.5 seconds it's still about twice as slow as indexing the vectors, and this is because even with the reduced checking there are more indirections (extra function calls) associated with dereferencing a debug iterator.

In summary...

Remember, the example code used here takes 34 milli-seconds in release mode. The example using vector indexing also takes about 34 milli-seconds in release, and the same algorithm indexing a raw array of ints takes the same time to execute, about 34 milli-seconds. This indicates that the code is being optimised to similar assembly in all three cases.

So, to summarise, STL iterators provide useful debug information for development, they allow you to access STL containers as if you are using a pointer, and they create production code that is as fast as using raw pointers. However, there is a cost in debug mode as they can slow your code down a lot. But, if these costs become too much of a hindrance there are ways to reduce the effect without changing how you are using the iterators.

No comments: