Tuesday, 16 December 2014

ManualResetEvent and ManualResetEventSlim (.NET)

Starting with .NET 4.0 the framework provides a ManualResetEventSlim class to compliment the existing ManualResetEvent. ManualResetEventSlim is much more efficient for short wait times. It can be used when the event doesn't cross process boundaries so if this is the case, and you don't expect threads to be blocked for long, then ManualResetEventSlim is the better option.

The efficiency improvement of ManualResetEventSlim is achieved because when it enters a Wait() (equivalent to ManualResetEvent.WaitOne()) it initially spins in a busy loop for a short time while waiting for the event to become signalled. This StackOverflow answer explains how the spinning is done using a combination of Thread.Yield, Thread.SpinWait, Thread.Sleep(0) and Thread.Sleep(1). If the event is signalled during this period then the wait returns faster than the ManualResetEvent, which is implemented using a Monitor.Wait.

If the event doesn't occur within a certain short time then the ManualResetEventSlim switches to use a Monitor.Wait just like the ManualResetEvent. Note that the initial spinning is CPU intensive, so if you are sure that wait times will be longer you can save CPU cycles by using a ManualResetEvent and performance won't suffer.

Illustration by Example

To illustrate the potential performance improvement consider the following code. Two tasks are created which progress around a loop 1 million times in tandem as they signal each-other to continue.

var numRepeats = 1000000;
var stopwatch = new Stopwatch();
var firstTask = new ManualResetEventSlim(false);
var secondTask = new ManualResetEventSlim(false);
var firstTaskComplete = new ManualResetEvent(false);

Task.Factory.StartNew(() => {
    for (var x = 0; x < numRepeats; x++) {
        firstTask.Wait();
        firstTask.Reset();
        secondTask.Set();
    }
    firstTaskComplete.Set();
});

Task.Factory.StartNew(() => {
    for (var x = 0; x < numRepeats; x++) {
        secondTask.Wait();
        secondTask.Reset();
        firstTask.Set();
    }
});

stopwatch.Start();
firstTask.Set();
firstTaskComplete.WaitOne();
stopwatch.Stop();
Console.WriteLine("Total time: " + stopwatch.Elapsed);

Using ManualResetEventSlim as shown the above executes in about 0.52~0.55 seconds on my machine. If I switch to using ManualResetEvent then the same task takes about 11~12 times longer, 5.8~6.4 seconds.

If I leave the events signaled for the entire task without setting or resetting them, so the waits still check the manual reset event but are never blocked, then the difference between ManualResetEventSlim and ManualResetEvent is much more prominent, up to 40x.

No Equivalent AutoResetEventSlim?

There's currently no slim version of AutoResetEvent as it's difficult to think of a time when it would be useful. If you're using AutoResetEvent it's probably because you want to give threads exclusive access to a resource to carry out some work. Chances are this work will take some non-zero amount of time, in which case it would be wasteful for other threads to be spinning while they wait for their signal to continue. Therefore, a slim version of AutoResetEvent is unlikely to be useful. For more info see this discussion.

Monday, 18 August 2014

AutoResetEvent and ManualResetEvent (.NET)

AutoResetEvent and ManualResetEvent facilitate synchronisation between threads by signalling when an event has occurred. A waiting thread calls the WaitOne() method which blocks while the event is in the non-signaled state. When the event is signaled from another thread then the waiting thread continues. New events are constructed with a Boolean parameter to be initially signaled or not.

The difference in the auto and manual reset events is that AutoResetEvent is automatically reset to the non-signalled state when a thread is released from a WaitOne() call. ManualResetEvent objects, on the other hand, must be reset manually.

A good analogy that is often used to describe the difference between AutoResetEvent and ManualResetEvent is that AutoResetEvent is like a turnstile that lets threads through one-at-a-time, and ManualResetEvent is like a flood gate that lets all threads through until it's closed.

To illustrate consider how results of a race could be compiled and processed in parallel by different tasks:

var newResultIn = new AutoResetEvent(false);
var allResultIn = new ManualResetEvent(false);

Task.Factory.StartNew(() => {
    while (results.Count != totalEnteries) {
        results.Add(GetNextResult());
        newResultIn.Set();
    }
    allResultIn.Set();});

Task.Factory.StartNew(() => {

    while (results.Count != totalEnteries) {
        newResultIn.WaitOne();
        UpdateLeaderBoard(results);
    }
});

Task.Factory.StartNew(() => {
    allResultIn.WaitOne();
    ProcessMaleAndFemaleWinners(results);
});

Task.Factory.StartNew(() => {
    allResultIn.WaitOne();
    ProcessAgeGroupWinners(results);
});

In this example the first task gets results and adds them to a list until all results are in. Each time a new result is added this is signaled to the second task using AutoResetEvent.Set(). The second task then continues past WaitOne(), at which point the event returns to the non-signaled state. When the leader board has been update the second task will again block on WaitOne() until the first task signals that another result is in.

When all results are in the first task then signals the ManualResetEvent allResultsIn. At this point the third and fourth tasks continue past allResultsIn.WaitOnt() and they process the results as necessary. Note that as a ManualResetEvent is used here the event remains signaled after a thread passes WaitOne(), meaning that the third and fourth tasks execute in parallel. If an AutoResetEvent had been used here then only one of the two waiting tasks would have continued.

Monday, 21 July 2014

C# 5.0 Breaking Change to Closures

In C# a lambda can capture local variables from the scope in which it's defined, and lambdas that do so are called closures. Captured variables are evaluated when the lambda is invoked, not when the variables were captured. For example, the following code:

int age = 30;
Action printAge = () => Console.WriteLine(age);
age++;
printAge();

prints 31, not 30.

Up until C# 4.0 there was one context in which closures didn't act as most people would expect. Consider the following:

var actions = new Action[3];
int i = 0;

foreach (var c in "dog") {
  actions[i++] = () => Console.Write(c);
}

foreach (var action in actions) {
  action();
}

Prior to C# 5.0 this printed "ggg". In effect the variable c persisted over each iteration of the first foreach, making it equivalent to:

char c;
for (int i = 0; i < 3; i++) {
  c = "dog"[i];
  actions[i] = () => Console.Write(c);
}

On the third time through the loop c is set to "g". Therefore, when the closures are executed c is evaluated to be "g", and so that's what's printed out three times.

If you wrote the loop using for as shown above then this is what you should expect. However, if you used the original foreach you would probably expect the output to be "dog", and starting with C# 5.0 that's what it is. In effect, from C# 5.0 on, the foreach loop is equivalent to:

for (int i = 0; i < 3; i++) {
  char c = "dog"[i];
  actions[i] = () => Console.Write(c);
}          

The subtle difference here is that the variable c is declared locally within each iteration of the foreach loop. Therefore, each closure captures a different variable, and it's value isn't changed from the time it's captured until the time it's evaluated.

Note this is a breaking change as code compiled prior to and since C# 5.0 will behave differently. However, it was felt that the change was worth it as it cleared up confusing behaviour.

Tuesday, 22 April 2014

Haskell Prime Number Generator (part 3 of 3)

This is the third of three posts looking at prime number generators in Haskell. The first post looked at an elegant but inefficient one-liner. Part 2 looked at longer but much more efficient solution. The code for those two posts was taken from users gauchopuro and vineet in the Project Euler forum.

In this, the final post on the topic, I tweak vineet's version firstly to make it a bit more readable, and then to improve its efficiency.


A more readable, but still efficient Prime Number Generator

The previously discussed solution uses three separate functions, but using the where keyword and partial application (which are both explained below), this can be reduced to the single function:

primes = 2 : filter (isPrime primes) [3..] where
  isPrime (x:xs) a
    | x*x > a        = True
    | a `mod` x == 0 = False
    | otherwise      = isPrime xs a

The where keyword allows you to use a term that was previously undefined, and which you then define. For example

volumeOfCylinder radius length = length * crossSectionalArea where
  crossSectionalArea = pi * radius * radius

where helps the readability in the above example and it also keeps the variable crossSectionalArea local to volumeOfCylinder. So, for example, adding:

flowRate height width volumetricFlowRate = volumetricFlowRate / crossSectionalArea where
  crossSectionalArea = height * width

to the same Haskell source file is fine and there's no name clash with crossSectionalArea from the original volumeOfCylinder function.


Partial Function Application in Haskell

Partial application is when a function is called without supplying all the arguments needed to evaluate it. For example, if I define the following function:

multiply x y = x * y

then I can define the function double through partial application of multiply like this:

double = multiply 2

In the definition of double I partially apply the multiply function supplying only one argument. The type of double is therefore a function that takes a Num (the second argument to 'multiply 2') and returns a Num. I could have specified the multiply's type to be a function that takes two Nums and returns a Num but if I let Haskell infer its type it will consider it to be a function that takes a Num and returns a function that takes a Num and returns a Num. This is expressed in Haskell notation as:

multiply :: Num a => a -> a -> a

and it makes more sense when considering the possibility of partial application.


Using Partial Application in the Prime Number Generator

The prime number generator at the top of this post contains:

filter (isPrime primes) [3..]

Remember that filter expects a predicate and a list. The predicate is applied to each item in the list and a new list is constructed from those items that return True. isPrime is a predicate, but it needs two arguments and so it can't be used by filter. However, (isPrime primes) is a partially applied function. It's already been supplied with a list of primes and so it only needs a number to test. It is therefore a suitable predicate and can be used by filter.

Along with the details in the previous two posts you should now have enough knowledge to understand this prime number generator.


A Final Efficiency Improvement

The prime number generator at the top of this post is quite clean and also pretty efficient. The final part of this post makes a small tweaks to give a slight efficiency improvement. It does make the code twice as long though, so you may not think it's not worth it.

The improvement is based on the fact that all prime numbers greater than 3 can be written in the form 6k +/-1. This follows from 'sieving' out all the multiples of 2 and 3. If we use this sieved list in the original code in place of [3..] then we'll only be testing prime candidates that can be written as 6k +/-1. This can be achieved by zipping the list of numbers that can be written as 6k-1 together with the numbers that can be written as 6k+1. The final solution therefore looks like this:

primes = 2 : 3 : filter (isPrime primes) times6PlusMinus1 where
  times6PlusMinus1 = concat $ map (\(x,y) -> [x,y]) $ zip times6Minus1 times6Plus1 where
    times6Minus1 = (map (\x -> 6*x-1) [1..])
    times6Plus1 = (map (\x -> 6*x+1) [1..])
  isPrime (x:xs) a
    | x*x > a        = True
    | a `mod` x == 0 = False
    | otherwise      = isPrime xs a

Again, you may not feel that the slight efficiency gain is worth the extra code. You could also think of other ways to zip the lists together, and you could go further and sieve out multiples of more primes. If I come up with or find any major improvements on this I'll do another post but for now I'll finish with this Haskell prime number generator.

Monday, 31 March 2014

Haskell Prime Number Generator (part 2 of 3)

In my previous post I detailed the following elegant but inefficient Haskell prime number generator:

primes (x:xs) = x : primes (filter ((/= 0) . (`mod` x)) xs)

This algorithm was taken from user gauchopuro in the Project Euler Problem 7 forum. In this post I'm going to look at a prime number generator also submitted in that forum.

Again, as I'm currently learning Haskell, I'm going to explain this function at the beginner level.


Prime Number Generator from user vineet:

isPrime x = isPrimeHelper x primes

isPrimeHelper x (p:ps)
        | p*p > x        = True
        | x `mod` p == 0 = False
        | otherwise      = isPrimeHelper x ps

primes = 2 : filter isPrime [3..]

This prime number generator is a bit longer than the previous one. It doesn't use a sieve, and it involves three functions, each of which, as I will show, is quite simple. The above code uses recursion, pattern matching, the filter function, the cons operator, and lazy evaluation, all of which are explained in the previous post. See that post for more details.

In addition, to parse the above code, you'll need to understand Haskell guards. The isPrimeHelper function is defined for three different conditions using the guard notation '|'. If the predicate after the guard evaluates to True then the function definition to the right of the corresponding equals is used. If more than one predicate evaluates to True only the first one is used. So, in the above example, if p*p > 0 then isPrimeHelper will return True. If p*p > 0 is False, then x `mod` p == 0 is evaluated next. If that's True then isPrimeHelper returns False. Otherwise isPrimeHelper returns the result of isPrimeHelper x ps.

Now to parse the above code. Firstly consider that the list of prime numbers is defined as:

primes = 2 : filter isPrime [3..]

This line is quite simple. It says that the list of prime numbers starts with 2, and is followed by all the numbers from 3 to infinity that return True when passed to isPrime. The isPrime function, again, looks like this:

isPrime x = isPrimeHelper x primes

This function uses the final component of this prime number generator, isPrimeHelper, to determine if the argument x is prime or not. Notice that it also uses primes which we've defined as the list of prime numbers. The list primes is infinite and so it's never complete, and you may think this would be a problem. However, as will be explained, this works thanks to Haskell's lazy evaluation and the way isPrimeHelper is written and called. Here's isPrimeHelper again:

isPrimeHelper x (p:ps)
        | p*p > x        = True
        | x `mod` p == 0 = False
        | otherwise      = isPrimeHelper x ps

Before explaining this code, here is an explanation of how the algorithm for isPrimeHelper works. It takes a number to check, x, and a list of primes (p:ps). If any number in the list of primes divides evenly into x then x is not prime, and the function returns False. The primes are checked one by one beginning with the smallest. If the algorithm reaches a primes that is greater than the square root of x then there's no need to check any further as x is prime (because any number x can have only one prime factor greater than the square root of x, and in the case of a prime number that factor is x itself).

The code for isPrimeHelper works as follows. The function uses guards (explained earlier). The first element from the list of primes passed in is p. If p*p > x, then p is greater than the square root of x, so x is a prime number and the function returns True. Otherwise the function checks if p is a factor of x. If so then x is not prime and isPrimeHelper returns False. Finally, if p isn't a factor of x then isPrimeHelper calls itself recursively with p being removed from the front of the list of primes.

Remember that the list of primes passed into isPrimeHelper initially has just the single element 2, and this list is extended as the algorithm runs. Due to lazy evaluation isPrimeHelper only requires the elements one at a time until it returns, and it never runs out of elements because one the conditions that lead to the function returning will always occur before the end of the list is reached.

This prime number generator is much faster that the previous one. Although it's a bit longer, each of the three functions it uses is quite simple to understand. In another post I'll show how to make this function a bit more performant, and in my opinion a bit more readable.

Friday, 21 February 2014

Haskell Prime Number Generator (part 1 of 3)

Project Euler Problem 7 reads "What is the 10 001st prime number?". Solving this problem requires a prime number generator. I solved this using Haskell, which I'm currently learning, but after I'd done so I learned a lot more by looking at others' solutions. Here I present some of the Haskell prime number generators being discussed in the Project Euler forum, problem 7 thread (I can't link to the forum as you need to have an account and have solved Problem 7 to access that thread):

As I'm just learning Haskell my aim is to annotate these examples at my own level, i.e. for beginners. Note, the examples I'm referring to were meant to solve a particular problem, so in some cases I've tweaked them slightly to make them general prime number generators.


Prime Number Generator from user gauchopuro:

primes (x:xs) = x : primes (filter ((/= 0) . (`mod` x)) xs)

This function must be supplied with a list of the positive integers starting at 2 and it uses the Sieve of Eratosthenes to calculate the list of the primes. For example a function that uses the above to get the first n prime numbers would look like this:

getFirstNPrimes n = take n $ primes [2..]

Or to get the nth prime:

getNthPrime n = primes [2..] !! (n-1)

Although very concise, once you understand the notation involved, primes is also very expressive. Here's what you need to know to parse it:

  1. primes uses pattern matching. It expects to be supplied with a list of integers and the(x:xs)to the left of the equals sign means that within the function (i.e. to the right of the equals sign) x will refer to the first element in the list and xs (pronounced exes, as in the plural of x) refers to the remainder of the list.
  2. primes is recursive. The list that it generates consists of the first element of the list passed into it and the output of primes when passed the remainder of the list after some filtering.
  3. The filter function expects two arguments. The first argument is a predicate and the second is a list. filter returns a new list consisting of the elements from the initial list that yield True when passed to the predicate.
  4. The . operator composes two functions i.e. (f . g) x == f (g x).
  5. The cons operator ':' appends a new element to the start of a list.


Haskell Lazy Evaluation

Although not strictly required by the primes function shown, it's also useful to understand Haskell's lazy evaluation. It means that Haskell will do just enough work to get to the result it needs. For example, consider the following function:

fib :: Int -> Int -> [Int]
fib a b = a : fib b (a+b)

The first line tells us that this function takes two integers and returns a list of integers. The second line defines the function and says that the list to return is made up of the first integer argument followed by the list produced by calling fib again recursively, with the original second argument as the new first argument, and the sum of the original arguments as the new second argument. This function can be called with 1 1 to generate the Fibonacci sequence as follows:

fib 1 1 = 1 : fib 1 2
        = 1 : 1 : fib 2 3
        = 1 : 1 : 2 : fib 3 5
        = 1 : 1 : 2 : 3 : fib 5 8 etc...

You can see how this generates the Fibonacci sequence, but if you're not familiar with lazy evaluation you'd expect this function to call itself recursively indefinitely and never return. However, lazy evaluation ensures that the function does just enough work to get the result it needs. So if you just want the first 10 Fibonacci numbers you call the function with:

take 10 (fib 1 1)

and the result is:

[1,1,2,3,5,8,13,21,34,55]

After the first 10 elements are generated Haskell has the result it needs and so fib returns without continuing to call recursively.


Understanding primes

Here is primes again:

primes (x:xs) = x : primes (filter ((/= 0) . (`mod` x)) xs)

and, again, it is to be called with the list of integers starting at two, for example like:

primes [2..] !! 101          -- To get the 100th prime

With the above knowledge you can now understand primes as implementing the Sieve of Eratosthenes as follows:

  1. It's originally passed a list of integers starting at 2 (2, 3, 4, 5...). The first element of that list, x, will originally be 2. This becomes the first element in the list of prime numbers to be returned.
  2. The remainder of the list, xs, is the integers from three upwards. They are filtered with the predicate ((/= 0) . (`mod` x)) to create a new list. Remember x is still 2, so this predicate first finds the remainder when the input is divided by 2, and returns True if that remainder is non-zero. As True is returned for all odd elements in xs, the even numbers are filtered out of the list. This filtered list (3, 5, 7, 9...) is passed into primes recursively.
  3. This time the first element in the list is 3, so x  is 3, and this becomes the next prime number that is returned. Now xs is the list of odd integers starting at 5. They are filtered as before with x=3 so that the filtered list is the integers from 5 that don't divide evenly by 2 or 3 (5, 7, 11, 13, 17...). This list is passed recursively into primes.
  4. This process continues adding another prime number to the list each time.
This process can be illustrated with the following equivalences:

primes [2, 3, 4, 5, 6...] = 2 : primes [3, 5, 7, 9, 11...]
                          = 2 : 3 : primes [5, 7, 11, 13, 17...]
                          = 2 : 3 : 5 : primes [7, 11, 13, 17, 19...]
                          = 2 : 3 : 5 : 7 : primes [11, 13, 17, 19, 23...]
                          = 2 : 3 : 5 : 7 : 11 : primes [13, 17, 19, 23, 29...]

Although not the most efficient prime number generator, I did find this one to be expressive, and a good learning aid.

I'll follow up in future posts with more prime number generators.

Monday, 3 February 2014

Comparison of C++ and Haskell for Project Euler

I've already solved quite a few of the problems on Project Euler using C++, but as I've recently been learning Haskell, I decided to redo them all using it. Solving real problems is a good way to learn a new language and as I've already done them in C++ I can make some comparisons between the two languages.


Haskell for Maths Problems

Without using Haskell for very long it's easy to see that it is well suited to maths problems. This is because the syntax is optimised for expressing mathematical statements and it often resembles maths notation. For example, consider the following definition for the absolute function, absolute:

absolute n
  | n >= 0    = n
  | otherwise = -n

You don't need to know Haskell to figure out what this function does. Another frequently used example is the factorial function. It can be defined, for example, in the following three ways:

factorial n
  | n == 0    = 1
  | otherwise = n * factorial (n-1)

factorial2 0 = 1
factorial2 n = n * factorial2 (n-1)

factorial3 n = product [1..n]

Not only are these more succinct than in a procedural language, but they also read more naturally.


Haskell for Project Euler, Problem 5

Given Haskell's suitability for maths problems, it should be well suited to Project Euler. I'll demonstrate that here using Problem 5, "What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?"

Firstly, here is an explanation of my approach. The solution must be a multiple of all the prime numbers below 20, and as they don't share any common factors the solution must be a multiple of the product of the primes below 20. Therefore I calculate this product and test multiples of it until I find the first one that the non primes below 20 also divide evenly into.

Here is the solution implemented in C++

bool isValidAnswer(int numberToCheck) {
  vector<u64> nonPrimesTo20{ 4, 6, 8, 9, 10, 12, 14, 15, 16, 18 };
  auto isMultiple = true;
  for_each(nonPrimesTo20.begin(), nonPrimesTo20.end(), [&](u64 n) {
    if (numberToCheck%n != 0) {
      isMultiple = false;
    }
  });
  return isMultiple;
}

int main(void) {
  const auto productOfPrimesTo20 = 2*3*5*7*11*13*17*19;
  auto numberToCheck = productOfPrimesTo20;

  while (!isValidAnswer(numberToCheck)) {
    numberToCheck += productOfPrimesTo20;
  }
  cout << "Answer is " << numberToCheck << endl;
  return 0;
}

and here is the same algorithm implemented in Haskell:

productOfPrimesTo20 = 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19
nonPrimesTo20 = [4, 6, 8, 9, 10, 12, 14, 15, 16, 18]
isValidAnswer x = and [mod x n == 0 | n <- nonPrimesTo20]
answer = head [x | x <- [productOfPrimesTo20, productOfPrimesTo20*2..], isValidAnswer x]

Again, it is clear that the Haskell solution is much more succinct than in C++. Not only that, but if you understand the Haskell notation, the Haskell solution is also more expressive, and it's easier to read as you don't need to keep track of a loop or any variables.

If you're not familiar with Haskell then this code will look pretty obfuscated, but I've got posts coming up on Haskell prime number generation that explain many of the techniques used above.

Sunday, 12 January 2014

Comparison of tuples using variadic templates

In a previous post I explained how the tuple has been implemented in VC++ 2013 using variadic templates. That post gave some explanation of the definition and construction of tuples. You'll should understand that post before you read this one.

The comparison of tuples is interesting and it also uses variadic templates. The STL provides comparison operators for tuples even though they can be composed of many different types, which aren't known, and often don't exist, until the client code is written. This post shows how this is achieved.

Comparison of tuples

The include file <tuple> contains the (non-member) helper operator:

template<class... _TypesLhs, class... _TypesRhs> inline
bool operator==(const tuple<_TypesLhs...>& _Left, const tuple<_TypesRhs...>& _Right)
{
  return (_Left._Equals(_Right));
}

Here _TypesLhs is the types of the tuple on the left hand side and _TypesRhs the types of the tuple on the right. Note that this is not the type of the tuple itself, but the collection of types of all the elements of the tuple. This function simply calls through to the tuple member function _Equals which looks like this:

template<class... TypesRhs>
bool _Equals(const tuple<TypesRhs...>& _Right) const
{
  return (_Myfirst._Val == _Right._Myfirst._Val
  && _Mybase::_Equals(_Right._Get_rest()));
}

Remember that a tuple is composed of a base part that holds a single element of a templated type, and a derived part that holds the remaining elements and the collection of associated types. This pattern continues recursively down to a base class that doesn't hold any value and is based on a template definition with zero types.

The _Equals function above compares the two elements in the most derived class. If they don't match then it will return false. If they do match is will call recursively into the _Equals function in the base class. If these calls make it all the way to the base tuple, which is templated for zero arguments, then it always returns true and the tuples match. This is another example of how powerful variadic templates can be.

You should understand that if the corresponding elements of two tuples can't be compared with the == operator then compiler will complain if you try to compare those tuples. This reminds us how much of the work here is being done at compile time. Template classes and template functions are not classes and functions. They're templates that the compiler uses to write classes and functions during compilation, and this is just the same when variadic templates are being used.

Comparing tuples with different numbers of elements

Comparing tuples of different sizes obviously shouldn't return true, but this is in fact also dealt with at compile time. You'll need to understand a few more things to see how this is done, but they're interesting so I'm adding them into this post.

Firstly, I previously omitted from the _Equals function the following line:

static_assert(_Mysize == sizeof...(_Other), "comparing tuple to object with different size");

The first parameter to a static_assert must be a constant expression that can be converted to a bool. A constant expression is an expression that can be evaluated by the compiler. As I'll show, _Mysize is a const and sizeof can be figured out by the compiler, so _Mysize == sizeof...(_Other) is a constant expression. If the compiler evaluates this expression to true then the static_assert declaration has no effect. If it evaluates to false the compiler gives an error with the message supplied in the second argument to static_assert.

So, how does the compiler determine the size of each tuple? The tuple member variable _Mysize is defined as follows:

static const size_t _Mysize = 1 + sizeof...(_Rest);

Here, _Rest is the collection of types of the tuple from which this class is derived, i.e. all the types not held by this class. To understand how this calculation works consider sizeof(tuple<double, long, std::string>). Here:

sizeof(tuple<double, long, std::string>) = 1 + sizeof(tuple<long, std::string);
sizeof(tuple<long, std::string>) = 1 + sizeof(std::string);
sizeof(tuple<std::string>) = 1 + sizeof(tuple<>);
sizeof(tuple<>) = 0;

This should explain how comparing tuples with differing numbers of elements can give a compilation error.

Sunday, 5 January 2014

More on STL Tuples - Implementation using Variadic Templates

My previous post on STL tuples mentioned that they can be implemented using variadic templates. This post looks at the VC++ 2013 tuple implementation to see how this has been achieved. Doing so gives a great example of how powerful variadic templates can be.

(Note, in order to keep the code below clean I've cut out some details, e.g. template parameters with defaults that are used. These extras have no affect on what I'm explaining and clutter up the code snippets. If you want to see these details go look in <tuple>.)

Simple class definition for a tuple in VC++ 2013

Firstly, there is the following class definition for tuple:

template<>
class tuple<>
{
public:
  tuple()
  {
  }
  // more implementation

This is a template class with no template parameters, and a constructor that takes no parameters. You could invoke this constructor with:

tuple<> temp;

but I can't think of a reason to ever do that.

Variadic template class definition for the tuple

There is then another tuple definition that uses variadic templates:

template<class T, class... _Rest>
class tuple<T, _Rest...> : private tuple<_Rest...>
{ // recursive tuple definition
public:
  _Tuple_val<T> _Myfirst;
  tuple<_Rest...> _Get_rest();

To understand this you'll need to know that the ... notation can be read as "collection of". Also, if you're not familiar with the class keyword inside the template angle brackets, it's equivalent to typename in this context.

Now you can begin to parse the above code. This definition of tuple is templated for a parameter of type T and a collection of types referred to as _Rest. It privately inherits from a tuple that is templated for this collection of types. The comment "recursive tuple definition" is not mine, it's taken from the VC++ 2013 STL code. It explains that when using the above definition the compiler will recursively create new tuple definitions for a shrinking list of types until that list is empty, at which point it will use the original tuple definition, which is templated for zero types.

To explain with an example, consider you wish to use tuple<double, long, std::string>. The compiler will first create a class definition of tuple that is templated for type double and the collection of types long and std::string. This class will inherit from another class definition of tuple that is templated for type long  and the collection of types std::string. This class will inherit from a tuple class that is templated for type std::string and inherits from the original tuple class definition that is templated for zero types.

Construction and Implementation details

Look at the two members that I've picked out. _Tuple_val<T> is a helper class and it's used to store a value of type T_Get_rest() is a member function that returns a reference to the base class. So, from the previous example, in the most derived class, which is templated for type double and the collection of types long and std::string_Myfirst refers to the value of the double and _Get_rest() returns a reference to its base class i.e. the class that is templated for a long  and the collection of types std::string.

Now look at the constructor that is used when many types are passed in:

explicit tuple(T&& _This_arg, _Rest&&... _Rest_arg)
_Myfirst(_This_arg), tuple<_Rest...>(_Rest_arg...)
{
}

As you can see, the constructor simply initializes the _Myfirst variable with the corresponding value _This_arg, and the rest of the arguments are passed to the base class, which is by definition a tuple templated for the types of those arguments.

Hopefully this example gives a good idea of the power of variadic templates. I'll follow up with another post on some more VC++ 2013 tuple implementation details including how they are compared.