Monday 21 October 2013

Some Math Utility Functions (on GitHub) to help out with Project Euler

I've been working through the problems on the Project Euler website recently, and as there were some maths functions that I needed over and over I decided to put them into a (C++) utility namespace, MathUtils, which I've put up on GitHub. This post is a short overview of what MathsUtils currently provides. The code should (hopefully) be self explanatory so I'll not provide much more detail here.

(I've also got a class up there called embiguint, which is an unsigned integer that can be arbitrarily large. It has proved useful in its current state but I intend to add a lot more functionality and make it behave more like a built in type at which point I'll do a blog post on it too.)

The functions currently included in MathsUtils are:

template <typename T> std::string
                     ToAlternateBaseRepresentation(
                     const T& value, const T& base);
template <typename T> std::vector<T>
                     GetDivisors(const T& number);
template <typename T> emu64 GetSumOfDivisors(const T& number);
template <typename T> emu64 Factorial(const T& x);
template <typename T> T
                     PowerOf(const T& base, const T& power);
template <typename T> std::vector<bool>
                     GetPrimesBoolArrayToN(const T& n);
template <typename T> std::vector<T> GetPrimesToN(const T& n);

I had originally developed the functions in a class with static members but I changed this to a namespace as it seemed more more natural. After all, classes are intended to define how to build objects whereas namespaces are used to group related functionality. In Effective C++, in an item entitled Perfer non-member non-friend functions to member functions, Scott Meyers argues that this approach helps to increase extensiblity and (counter-intuitively) encapsulation.

All the functions are templated as I want to make them as easy to use as possible. There's still lots more I could add to these functions, for example the PowerOf functions only works with non-negative exponents, however each function was written or extended with a particular problem in mind. I'll add more functionality as I need it and in the mean time the functions should throw a useful exception if you ask them to do something they can't do.

There's also lots more I can do with regard to optimisation, and this should give me an opportunity to try out (and learn) some new C++11 features. For example PowerOf and Factorial could benefit from constexpr, although I've just been looking at the release of Visual Studio 2013 and I don't think this feature has made it in.  I guess I can achieve the same result with template metaprogramming if I really want quicker run-time code.

Anyway, my MathUtils are in place and poised to help out on more Project Euler problems.

Sunday 6 October 2013

Computing the Collatz sequence

Problem 14 from ProjectEuler concerns the Collatz sequence. From the problem's description:

The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)

This shows how to compute the Collatz sequence for n. As an example, the Collatz sequence for 13 is:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

The problem is to find which number, under one million, has the longest Collatz sequence. I took the following approach:

From the above example, the length of the Collatz for 13 is equal to the length for 40's  sequence plus one. In general, the length of the Colaltz sequence for n is equal to the length for second term of that sequence plus one. This can be implemented neatly with the recursive function:
uint getCollatzSeriesLength(uint value)
{
  uint seriesLength;

  if (value == 1) {
    seriesLength = 1;
  }
  else {
    if (value%2 == 0) {
      seriesLength getCollatzSeriesLength(value/2) + 1;
    }
    else {
      seriesLength = getCollatzSeriesLength(value*3+1) +1;
    }
  }
  
  return seriesLength;
}

The recursion is terminated by returning a length of 1 for a value 1. Using the above function for it takes about 580 milliseconds to calculate the Collatz series length for every integer from 1 to 1,000,000 on my machine.

I then made the following optimisation. Consider again the Collatz series of 13:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
If the program starts at 1 then when it comes to 13 it has already calculated the series length for 10. If this value was cached then it could be returned without recursing all the way to the end of the series. Therefore when when the algorithm calculates a series length it should cache that, and the function should check that cache before calculating the series length for the next term. Note an upshot of this is that the series length's for 40 and 20 will be calculated in the course of calculating the series length of 13. The updated function looks like this (changes highlighted):
uint getCollatzSeriesLength(uint value)
{
  uint seriesLength;

  if ((value <= noTerms) && (lengthsCache[value] != 0)) {
    seriesLength = lengthsCache[value];
  }
  else if (value == 1) {
    seriesLength = 1;
  }
  else {
    if (value%2 == 0) {
      seriesLength = getCollatzSeriesLength(value/2) + 1;
    }
    else {
      seriesLength = getCollatzSeriesLength(value*3+1) +1;
    }
    if (value <= noTerms) {
      lengthsCache[value] = seriesLength;
    }
  }

  return seriesLength;
}

Here noTerms is the number of terms to be checked (one million) and lengthsCached is a vector initialised with one million and one zeros (so it's effectively indexed from one). This optimisation makes around a 21x difference in terms of speed, as it now takes about 28 milliseconds to run the program.

I made a final optimisation by considering that when the series length has been calculated for some value, n, then the series length for 2n will be one greater. Using this the following ammendment was made to the code:
uint getCollatzSeriesLength(uint value)
{
  uint seriesLength;

  if ((value <= noTerms) && (lengthsCache[value] != 0)) {
    seriesLength = lengthsCache[value];
  }
  else if (value == 1) {
    seriesLength = 1;
  }
  else {
    if (value%2 == 0) {
      seriesLength = getCollatzSeriesLength(value/2) + 1;
    }
    else {
      seriesLength = getCollatzSeriesLength(value*3+1) +1;
    }
    auto additionalSteps = 0;
    for (auto i=value; i<=noTerms; i*=2) {
      lengthsCache[value] = seriesLength additionalSteps++;
    }
  }

  return seriesLength;
}

This brings the program's execution time down by a further factor of ~2.3x to around 12 milliseconds on my machine.

Wednesday 2 October 2013

Project Euler Progress

I've been working through the problems on Project Euler and as I was making notes anyway I thought I'd put them in a log, on the web, hence this blog. If it serves no other purpose it will be somewhere to jog my memory on the things I've learned.

Project Euler (in case you haven't clicked on the link above) is a good way to exercise your programming chops. The problems are maths based but generally can't be solved with pen and paper as they involve too much work. For example, the first problem is "Find the sum of all multiples of 3 or 5 below 1000", and a simple C++ solutions is:
 int total = 0;
 for (int i=0; i<1000; i++) {
  if ((i%3==0) || (i%5==0)) {
   total += i;
  }
 }
 cout << "Answer is " << total << endl;
They get progressively more difficult from there but the emphasis stays on algorithms as opposed to mathematics.

To get started, here's my solution to Problem 24.


What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?
I began by thinking about the simple case of the permutations of the digits 0, 1, and 2, let's say we want to find the fourth permutation. There are 3! (i.e. 6) elements in the list, and listed in lexicographic order they are:

012, 021, 102, 120, 201, 210

We can split the list into three sub-sections: two elements beginning with 0, two beginning with 1, and two beginning with 2. It's easy to see that the fourth element in the list is in the second of these sub-sections. Stated in terms of a zero indexed array: position 3 in the array is in position 1 of the array of sub-sections. We can use the size of these sub-sections, 2!, to calculate this result as 3/2! = 1 (because integers are truncated after division this works when the numbers don't divide evenly). The elements in this sub-sections start with the digit 1 and so the first digit of the answer is a 1.

Now we descend into the sub-section 102, 120. We just need to remove the digit 1 from the list of digits we're considering, recalculate the number of elements in this sub-section, and recalculated the position of our answer in this sub-section, then we can repeat the process until we've got the answer. The array is one element shorter than before so the new size is 2!. We can essentially discount all sub-sections before the one we're now in so the new position is the previous position minus 2!*1. Repeating the process brings us to the fourth element, 120.

This 3 digit example is trivial but the algorithm can be applied to a list with an arbitrary number of digits. Here's the C++ code to find the 1,000,000th lexicographic permutation of the digits 0 to 9:
auto pos = 999999; // 1,000,000th position, zero indexed
uint numDigits = 10;
vector<uint> digits(numDigits);
emuint n = 0;
generate(digits.begin(), digits.end(), [&n](){return n++;});

cout << "Answer is ";
for (uint i=numDigits; i>1; i--) {
  uint nextDigit = pos/Factorial(i-1);
  cout << digits[nextDigit];
  digits.erase(digits.begin()+nextDigit);
  pos -= nextDigit*Factorial(i-1);
}
cout << digits[0] << endl;
Note, this uses a factorial function, like one you could find here.