Sunday 22 December 2013

STL Tuples

The STL tuple class is a bit like the STL pair, except the tuple supports any number of types to be tied together in a single object. This can be achieved using variadic templates, although many tuple implementations were completed before variadic templates were available (e.g. VC++ 2012). They implemented the tuple by using different macros for different numbers of arguments.


STL pairs are useful but sometimes not sufficient

STL pairs are useful when two pieces of data are always coupled together, for example:

std::pair<std::string, float> raceWinner{ "M40", 17.20 };

However, in the past, to include any more pieces of data required some more work, for example creating a simple struct or class:

struct triathlonResult {
std::string ageCategory;
float swim;
float cycle;
float run;
};

which (if uniform initialization is supported) can then be used with:

triathlonResult triathlonWinner{ "M35", 13.45, 35.5, 17.3 };

STL tuples extend the pair to any number of arguments

The introduction of the tuple to the standard library means that a custom class is no longer needed, instead:

std::tuple<std::string, float, float, float> triathlonWinner( "F35", 13.45, 35.5, 17.3 );

can be used to create an object that holds the same information.

Initialization can be simplified using auto and make_tuple like this:

auto triathlonWinner = make_tuple("F35", 13.45, 35.5, 17.3);

An added convenience is that the comparison operators (==, !=, >, >=, < and <=) are included in the standard library. From the standard [tuple.rel] - "The elementary comparisons are performed in order from the zeroth index upwards. No comparisons or element accesses are performed after the first equality comparison that evaluates to false".


Retrieving elements from a tuple

Retrieving elements from a tuple may differ from how you're used to getting them from a pair. One way to retrieve members from a pair is to use the public members first and second. For example:

auto raceWinnersCategory = raceWinner.first;
auto raceWinnersTime = raceWinner.second;

However, there is no tuple member function to retrieve elements, instead they must be retrieved using the non-member function get (defined in <tuple>). The reason is that the element to retrieve must be indexed, and if this were to happen on a member function it would need to be invoked using the template keyword like this:

auto triWinnersCategory = triathlonWinner.template get<0>();
(for more information see [tuple.helper] note 11 from the standard)

To avoid this a non-member get function is provided, to be used as shown:
auto triWinnersCategory = get<0>(triathlonWinner);
auto triWinnersSwimTime = get<1>(triathlonWinner);
auto triWinnersCycleTime = get<2>(triathlonWinner);
auto triWinnersRunTime = get<3>(triathlonWinner);

This may not seem very natural to code at first but it is easy to read. Note that this get function can also be used for pairs, so if you suspect your pairs may one day be promoted to tuples then it's a good idea to use get with pairs too.

Sunday 15 December 2013

Uniform Initialization in VC++ 2013

Before uniform initialization was introduced to C++ there were many ways to initialize variables. For example:

int myInt(3);
int myInt = 3;
Musician guitarist; // Zero argument constructor
Musician drummer("Black", 10.01, true);

Uniform initialization provides a uniform way to initialize objects using curly braces. For example:

int myInt {4};
Musician piper {}; // Zero argument constructor
Musician jazzFlautist {"Black", 12.34, true};

Uniform initialization provides a standard method for initializing objects and for consistency alone should be the first preference. However, there are further advantages.

Further advantages of Uniform Initialization

The uniform initialization syntax can be used with the class std::initializer_list<> to pass a variable number of similar arguments to a constructor. This is useful for creating std containers, for example:

vector<int> myInts { 1, 3, 2, 4};

will create a vector of ints with four elements: 1, 2, 3, and 4To make use of std::initializer_list<> in your own classes you'll need to write an initializer list constructor, i.e. a constructor that takes an std::initializer_list<> of the desired type and parses it as required. If you do this users will be able to use the { } notation to invoke that constructor.

Uniform initialization can also be nested, for example:

vector<Musician> band { guitarist ,
                        jazzFlautist,
                        {"Blue", 43.21, false} };

In the above case guitarist and jazzFlautist are added to the vector, as is a third anonymous object that is created and copied (or hopefully moved) into the vector.

Uniform initialization also brings a way to avoid Most Vexing Parse confusion. For a simple illustration consider the following code:

Musician harpist();

This may be considered an acceptable way to create a Musician object called harpist with the zero argument constructor, but in fact it declares a function called harpist that takes zero arguments and returns a Musician. This ambiguity can take more subtle forms, for example where a function call is intended as an input parameter to a function call, but the compiler reads this as a function declaration that takes a function type as a parameter. It's so called the most vexing parse for good reason, but it is avoided using by the curly braces uniform initialization notation, i.e.

Musician harpist{};

Another advantage of uniform initialization is that it is a convenient way to initialise Aggregate classes. For example given the following definition:

struct number {
int asInteger;
std::string asString;
bool isPrime;
};

a number can be initialized with:

number theNumberFour{ 4, string("four"), false };

Reasons to still use the old constructor syntax

There may still be times when you wish to use the old constructor syntax. If a class has an initializer list constructor then it will take precedence over other constructors. If you wish to call the other constructors you'll need to use the old constructor syntax to be explicit.

For example, if you wish to create a std::vector of ints and you wish to call the constructor that takes a single parameter specifying the vector's size then writing:

std::vector<int> myInts{4};

won't work. The above code would create a vector of ints and initialize it with the single element 4. Instead you need to use the round bracket notation :

std::vector<int> myInts(4);

However, in general you should default to using uniform initialization, and only break that pattern when you've got a specific reason.

Sunday 8 December 2013

Writing an Arbitrarily Large Integer Class in C++

Project Euler Problem 25 is to find the first 1000 digit Fibonacci number. I'm implementing a solution in C++ so the problem here is that the largest number that can be represented with a built-in type is 264A few of the Project Euler problems prior to this one required handling numbers greater than 264 - for example Problem 13 is to find the 10 least significant digits of a sum of one-hundred 50-digit numbers - but in those cases it was possible to work around the 264 limitation by discarding digits that aren't needed in the final answer or using doubles.

However, this problem prompted me to write my own class that could deal with arbitrarily large integers. To this end I have created embiguint.


Main goals of EMBigUInt

The main goals of the class are as follows:
  • The class should work naturally with the build in types. For example it should support both:
    • embiguint = int + embiguint, and
    • embiguint = embituint + int
  • The class should provide implicit construction from the built in types.
  • Like the built-in types, the class should work with all the standard arithmetic operators (+, -, /, *, +=, *=, etc...) and comparison operators (==, >, >=, !=, etc...). These operators should work with embiguints and between embiguints and the built-in types.
  • The class should provide some additional functionality beyond what the built-in types, for example:
    • Construction from a string of digits to support constructing an embiguint greater than 264 e.g. 
      embiguint myBigUInt("1234567890123456789012")
    • Return a count of the number of digits.
There's still much more work to do to make the class I want, but here are some notes on design decisions I've taken so far:
  • Efficiency is not my first concern. My initial aim it to get a working class. When I've got all the functionality I need with unit tests and performance tests in place I can work on efficiency in the confidence that I won't break existing functionality and I can measure any speed gains. Some outcomes from this are:
    • I'm using standard library vectors, which are easy to use, provide lots of debugging help, and come with lots of standard library algorithms. It's worth investigating if raw arrays could improve efficiency, but for now this would slow down development.
    • I'm relying on implicit conversion from built-in types to embiguint for some of the functions. I could implement more efficient functions if I overloaded specifically for the built-in types but, again, this can be done as a refactoring.
  • So far, most of the functions in the library are non-member functions. This follows from the advise in Item 24 of Scott Meyers' Effective C++: Declare non-member functions when type conversion should apply to all parameters, and it allows me to do
    embiguint = int + embiguint

    Here I haven't overloaded
    operator+ for the type int, instead the non-member function 
    const embiguint operator+ (const embiguint& lhs, const embiguint& rhs)

    is being used with implicit type conversion converting the
    int to an embiguint.

Using EMBigUInt for Project Euler Problem 25

This class in its current state enables Project Euler Problem 25 to be solved very simply with the following:

embiguint lastValue = 0;
embiguint currentValue = 1;
emuint counter = 1;

while (currentValue.GetNoDigits() < 1000) {
  embiguint temp = currentValue;
  currentValue += lastValue;
  lastValue = temp;
  counter++;
}

cout << "Answer is " << counter << endl;

Wednesday 27 November 2013

Some STL functions are Simpler Than You Might Think

The STL functions generate and generate_n are quite useful, but when I first used them I thought I must be missing something. It turns out though that these functions, like many in the standard library, are really simple.

The first time I used generate_n was to initialise a vector. I looked at the documentation and quickly found how to achieve this, but sub-consciously I assumed I was only scratching the surface of what it could do. I probably thought that way about all STL functions, and in many cases I would have been right. For an example try stepping into the line:

std::cout << "Hello, world\n";

If you understand everything that happens in the above line of code you're probably someone who gets paid to implement the standard library. The fact that the operator<< is from the std namespace should have you scratching your head until you've read about Name Lookup, more specifically Koenig Lookup.

However, with generate_n, things are much more straight forward. In fact, generate_n is most easily understood, and in my case demystified, by looking at the implementation. Here's the crux of the function from VC++ 2012:

template<class _OutIt,
  class _Diff,
  class _Fn0> inline
  _OutIt _Generate_n(_OutIt _Dest, _Diff _Count, _Fn0 _Func)
  {
  for (; 0 < _Count; --_Count, ++_Dest)
    *_Dest = _Func();
  return (_Dest);
  }


That's as difficult as it gets

The above code is probably as easy to understand as any documentation on generate_n. But, more importantly, when you get this, you understand the function completely. There's nothing else to it, there's no smoke nor mirrors, no __asm__, and it doesn't link to any binaries to do the real work. It's just a small bit of C++ code.

The generate function is similar except it expects two input iterators and the for loop looks like:

for (; _First != _Last; ++_First)

If you look at the standard itself this shouldn't come as a surprise. After the function declaration there are just seven lines detailing what both generate and generate_n need to conform to. Many other useful STL functions have a similar level of complexity.

This won't come as a surprise to everyone, but for me it was encouraging to know that although I may never understand the language completely, there are many parts that I can get to the bottom of quite easily. It's also encouraged me to step into standard library functions as a first port of call, and shown me a less intimidating side of the C++ Standard itself. That said, I think I need some more experience before I'm ready to tackle:

std::cout << "Hello, world\n";

Tuesday 12 November 2013

Understanding Debug Iterators in VC++ 2012

In debug mode Visual C++ STL iterators can provide useful information at run-time via failed assertions, but there’s a performance cost and in certain scenarios they can seem to grind your program to a halt. This post shows why that happens, and how to fix it. Consider the following code:

vector<int> myInts;
myInts.push_bach(1);
cout << myInts[4] << endl;

It will compile O.K., and when done so in release mode the resultant program will happily reach into memory beyond the myInts vector and send whatever garbage it finds there to cout. However, in debug mode stepping over the third line produces the popup “Debug Assertion Failed… vector subscript out of range”. That may not be a big deal in the above case but if the index were calculated in a loop dependent on a number of variables that information could be very useful.

For another example consider what’s wrong with now adding the following lines:

auto it = myInts.begin();
myInts.push_back(2);
cout << *it << endl;

Chances are this code doesn't do what you want it to because pushing the 2 onto the vector probably requires it to grow, which requires it to move, which leaves the iterator it pointing at unallocated memory. Again, in release mode cout will get whatever it is now pointing at, but in debug mode we get “vector iterator not dereferencable”. While dereferencable may not be a real word you’ll soon figure out what it means and it will probably save some debugging time.

However, these extra checks come at a cost. Pretty much any call to an iterator involves a lot of run-time checks. For example dereferencing an iterator involves checking that the parent container isn’t null, checking the iterator’s internal pointer isn’t null, and checking that the iterator doesn’t point to memory before the parent vector’s first iterator, or after its last iterator. The parent container also has extra concerns when debug iterators are being used. All iterators that point into one container form a singly linked list. This allows the parent container to communicate with those iterators, but walking the list when it needs to is slow.

Example of slow iterators in debug mode

As a simple example, consider a list of 10,000 numbers from 1-100. I wish to find the number of pairs within that list that sum to 100. Here’s some code to achieve that:

auto noCombosSumTo100 = 0;
for (auto it1 = begin(myNums); it1 != end(myNums); it1++) {
  for (auto it2 = it1; it2 != end(myNums); it2++) {
    if (*it1 + *it2 == 100) {
      noCombosSumTo100++;
    }
  }
}

On my machine this takes about 34 milli-seconds fully optimised in release mode, and about 75 seconds in debug. That's about 2200x difference between debug and release!

Obviously this difference can't be entirely attributed to the iterators, there are lots of other opportunities for the compiler to optimise the above code in release mode. However, if I implement the same algorithm using vector indexing, like this:

for (auto index1 = 0; index1 < numTerms; index1++) {
  for (auto index2 = index1; index2 < numTerms; index2++) {
    if (mynums[index1] + mynums[index2] == 100) {
      noCombosSumTo100++;
    }
  }
}

it runs in about 4.5 seconds, so there's clearly a big cost to using debug iterators.

Ways to improve iterator performance

The improvements I'm about to show for the first code snippet could apply to any code, but it's worth understanding that any inefficiencies are exacerbated when debug iterators are involved. To improve the first code snippet while still getting the benefit of debug iterators I can:

These three modifications bring the execution time down to 21 seconds. Interestingly none of them make any difference in the second snippet where built in types are involved. Again, it's worth understanding that these improvements would apply to any user defined type (in debug mode), but it probably wouldn't be just as noticeable as when iterators are involved.

If you want a further speedup then you'll need to sacrifice some additional run-time checks by setting _ITERATOR_DEBUG_LEVEL=1 or 0 (This macro supersedes and combines the functionality of the older _SECURE_SCL and _HAS_ITERATOR_DEBUGGING macros). Doing so decreases the amount of debug help you get, but it brings the original example's execution time down to about 8.5 seconds.

At 8.5 seconds it's still about twice as slow as indexing the vectors, and this is because even with the reduced checking there are more indirections (extra function calls) associated with dereferencing a debug iterator.

In summary...

Remember, the example code used here takes 34 milli-seconds in release mode. The example using vector indexing also takes about 34 milli-seconds in release, and the same algorithm indexing a raw array of ints takes the same time to execute, about 34 milli-seconds. This indicates that the code is being optimised to similar assembly in all three cases.

So, to summarise, STL iterators provide useful debug information for development, they allow you to access STL containers as if you are using a pointer, and they create production code that is as fast as using raw pointers. However, there is a cost in debug mode as they can slow your code down a lot. But, if these costs become too much of a hindrance there are ways to reduce the effect without changing how you are using the iterators.

Friday 1 November 2013

Fast Implementation of the Factorial function in C++

I’ve been investigating the fastest way to implement the factorial function in my MathsUtils library. I had originally implemented it as simply:

template <typename T>
emu64 MathUtils::RecursiveFactorial(const T& x)
{
  return (x <= 1 ? 1 : x * Factorial(x - 1));
}

although I suspected this wasn’t the most efficient way, and I wanted to measure it.
To measure the efficiency of this implementation I used the following benchmark test:

for (ulonglong i=1; i<10000000; i++) {
  for (ulonglong j=1; j<20; j++) {
    RecursiveFactorial(j);
  }
}

(I don't find the factorial for anything greater than 20 as it would be too large to represent with a 64-bit unsigned number)

First I took a closer look at what the compiler was doing with the above factorial implementation, and the assembly for a debug build was as expected, it implements recursion. With this implementation the above test takes about 14 seconds on my machine in debug mode.

I found that the assembly the compiler writes for this function doesn't change regardless of if a const value is passed in. This would change in the future if I mark the function with constexpr. VC++ 2012 doesn’t support this feature yet but it would allow the compiler to evaluate a function's return value so you get it for free at run time.

I then thought about inlining the function. If the compiler did this blindly then it might hang while writing an inline function, inside an inline function, etc… Compilers are smarter than this though and looking at the assembly showed that using inline didn't make any difference to the output produced by the compiler. This is understandable as the inline keyword is only a hint to the compiler, one it is free to ignore.

I found, however, that it is possible to make the compiler inline a recursive function using the inline_recursing pragma (for VC++, there are equivalents for other compilers). When this is used the recursive function will be unrolled and written inline up to a maximum of 16 times, and this is also the default inline recursion depth. The depth can be changed using inline_depth.This Stack Overflow answer gives a good explanation of how a recursive function can be inlined to a depth of three.

To compile using these pragmas you need to turn off some debugging options, so after I'd seen from the assembly that the function was indeed being inlined I switched to release mode. Here are the results for the original RecursiveFactorial() function on the benchmark test:
  • Without inlining - 7.9 seconds
  • With inlining and a depth of five - 6.7 seconds.
I found that an inline recursion depth of five gave the best results.

As an interesting aside, the above test, shown here again:

for (ulonglong i=1; i<10000000; i++) {
  for (ulonglong j=1; j<20; j++) {
    RecursiveFactorial(j);
  }
}

actually runs instantaneously in release mode. This is because, in release mode, the compiler detects that the call to RecursiveFactorial() is dead code, so it's eliminated from the compiled assembly. To force it to execute I had to add the result to a running total, and even then the calls were skipped unless I used the total afterwards, so I printed it to the console.

Having spent some time tweaking the recursive factorial function I still had a suspicion that a simple for loop would be faster, so I tested the following:

template <typename T>
ulonglong ForLoopFactorial(const T& value)
{
  ulonglong returnValue = 1;
  for (ulonglong i = 2; i <= value; i++) {
    returnValue *= i;
  }
  return returnValue;
}

In release mode this takes 5.1 seconds to run on my machine. So, despite digging into the first option, and learning a few things, the simpler for loop wins for speed, so that's what I've updated my MathUtils library to use. Unit tests still pass so I haven't broken anything.

I've put the code for this spike on my GitHub.

Monday 21 October 2013

Some Math Utility Functions (on GitHub) to help out with Project Euler

I've been working through the problems on the Project Euler website recently, and as there were some maths functions that I needed over and over I decided to put them into a (C++) utility namespace, MathUtils, which I've put up on GitHub. This post is a short overview of what MathsUtils currently provides. The code should (hopefully) be self explanatory so I'll not provide much more detail here.

(I've also got a class up there called embiguint, which is an unsigned integer that can be arbitrarily large. It has proved useful in its current state but I intend to add a lot more functionality and make it behave more like a built in type at which point I'll do a blog post on it too.)

The functions currently included in MathsUtils are:

template <typename T> std::string
                     ToAlternateBaseRepresentation(
                     const T& value, const T& base);
template <typename T> std::vector<T>
                     GetDivisors(const T& number);
template <typename T> emu64 GetSumOfDivisors(const T& number);
template <typename T> emu64 Factorial(const T& x);
template <typename T> T
                     PowerOf(const T& base, const T& power);
template <typename T> std::vector<bool>
                     GetPrimesBoolArrayToN(const T& n);
template <typename T> std::vector<T> GetPrimesToN(const T& n);

I had originally developed the functions in a class with static members but I changed this to a namespace as it seemed more more natural. After all, classes are intended to define how to build objects whereas namespaces are used to group related functionality. In Effective C++, in an item entitled Perfer non-member non-friend functions to member functions, Scott Meyers argues that this approach helps to increase extensiblity and (counter-intuitively) encapsulation.

All the functions are templated as I want to make them as easy to use as possible. There's still lots more I could add to these functions, for example the PowerOf functions only works with non-negative exponents, however each function was written or extended with a particular problem in mind. I'll add more functionality as I need it and in the mean time the functions should throw a useful exception if you ask them to do something they can't do.

There's also lots more I can do with regard to optimisation, and this should give me an opportunity to try out (and learn) some new C++11 features. For example PowerOf and Factorial could benefit from constexpr, although I've just been looking at the release of Visual Studio 2013 and I don't think this feature has made it in.  I guess I can achieve the same result with template metaprogramming if I really want quicker run-time code.

Anyway, my MathUtils are in place and poised to help out on more Project Euler problems.

Sunday 6 October 2013

Computing the Collatz sequence

Problem 14 from ProjectEuler concerns the Collatz sequence. From the problem's description:

The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)

This shows how to compute the Collatz sequence for n. As an example, the Collatz sequence for 13 is:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

The problem is to find which number, under one million, has the longest Collatz sequence. I took the following approach:

From the above example, the length of the Collatz for 13 is equal to the length for 40's  sequence plus one. In general, the length of the Colaltz sequence for n is equal to the length for second term of that sequence plus one. This can be implemented neatly with the recursive function:
uint getCollatzSeriesLength(uint value)
{
  uint seriesLength;

  if (value == 1) {
    seriesLength = 1;
  }
  else {
    if (value%2 == 0) {
      seriesLength getCollatzSeriesLength(value/2) + 1;
    }
    else {
      seriesLength = getCollatzSeriesLength(value*3+1) +1;
    }
  }
  
  return seriesLength;
}

The recursion is terminated by returning a length of 1 for a value 1. Using the above function for it takes about 580 milliseconds to calculate the Collatz series length for every integer from 1 to 1,000,000 on my machine.

I then made the following optimisation. Consider again the Collatz series of 13:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
If the program starts at 1 then when it comes to 13 it has already calculated the series length for 10. If this value was cached then it could be returned without recursing all the way to the end of the series. Therefore when when the algorithm calculates a series length it should cache that, and the function should check that cache before calculating the series length for the next term. Note an upshot of this is that the series length's for 40 and 20 will be calculated in the course of calculating the series length of 13. The updated function looks like this (changes highlighted):
uint getCollatzSeriesLength(uint value)
{
  uint seriesLength;

  if ((value <= noTerms) && (lengthsCache[value] != 0)) {
    seriesLength = lengthsCache[value];
  }
  else if (value == 1) {
    seriesLength = 1;
  }
  else {
    if (value%2 == 0) {
      seriesLength = getCollatzSeriesLength(value/2) + 1;
    }
    else {
      seriesLength = getCollatzSeriesLength(value*3+1) +1;
    }
    if (value <= noTerms) {
      lengthsCache[value] = seriesLength;
    }
  }

  return seriesLength;
}

Here noTerms is the number of terms to be checked (one million) and lengthsCached is a vector initialised with one million and one zeros (so it's effectively indexed from one). This optimisation makes around a 21x difference in terms of speed, as it now takes about 28 milliseconds to run the program.

I made a final optimisation by considering that when the series length has been calculated for some value, n, then the series length for 2n will be one greater. Using this the following ammendment was made to the code:
uint getCollatzSeriesLength(uint value)
{
  uint seriesLength;

  if ((value <= noTerms) && (lengthsCache[value] != 0)) {
    seriesLength = lengthsCache[value];
  }
  else if (value == 1) {
    seriesLength = 1;
  }
  else {
    if (value%2 == 0) {
      seriesLength = getCollatzSeriesLength(value/2) + 1;
    }
    else {
      seriesLength = getCollatzSeriesLength(value*3+1) +1;
    }
    auto additionalSteps = 0;
    for (auto i=value; i<=noTerms; i*=2) {
      lengthsCache[value] = seriesLength additionalSteps++;
    }
  }

  return seriesLength;
}

This brings the program's execution time down by a further factor of ~2.3x to around 12 milliseconds on my machine.

Wednesday 2 October 2013

Project Euler Progress

I've been working through the problems on Project Euler and as I was making notes anyway I thought I'd put them in a log, on the web, hence this blog. If it serves no other purpose it will be somewhere to jog my memory on the things I've learned.

Project Euler (in case you haven't clicked on the link above) is a good way to exercise your programming chops. The problems are maths based but generally can't be solved with pen and paper as they involve too much work. For example, the first problem is "Find the sum of all multiples of 3 or 5 below 1000", and a simple C++ solutions is:
 int total = 0;
 for (int i=0; i<1000; i++) {
  if ((i%3==0) || (i%5==0)) {
   total += i;
  }
 }
 cout << "Answer is " << total << endl;
They get progressively more difficult from there but the emphasis stays on algorithms as opposed to mathematics.

To get started, here's my solution to Problem 24.


What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?
I began by thinking about the simple case of the permutations of the digits 0, 1, and 2, let's say we want to find the fourth permutation. There are 3! (i.e. 6) elements in the list, and listed in lexicographic order they are:

012, 021, 102, 120, 201, 210

We can split the list into three sub-sections: two elements beginning with 0, two beginning with 1, and two beginning with 2. It's easy to see that the fourth element in the list is in the second of these sub-sections. Stated in terms of a zero indexed array: position 3 in the array is in position 1 of the array of sub-sections. We can use the size of these sub-sections, 2!, to calculate this result as 3/2! = 1 (because integers are truncated after division this works when the numbers don't divide evenly). The elements in this sub-sections start with the digit 1 and so the first digit of the answer is a 1.

Now we descend into the sub-section 102, 120. We just need to remove the digit 1 from the list of digits we're considering, recalculate the number of elements in this sub-section, and recalculated the position of our answer in this sub-section, then we can repeat the process until we've got the answer. The array is one element shorter than before so the new size is 2!. We can essentially discount all sub-sections before the one we're now in so the new position is the previous position minus 2!*1. Repeating the process brings us to the fourth element, 120.

This 3 digit example is trivial but the algorithm can be applied to a list with an arbitrary number of digits. Here's the C++ code to find the 1,000,000th lexicographic permutation of the digits 0 to 9:
auto pos = 999999; // 1,000,000th position, zero indexed
uint numDigits = 10;
vector<uint> digits(numDigits);
emuint n = 0;
generate(digits.begin(), digits.end(), [&n](){return n++;});

cout << "Answer is ";
for (uint i=numDigits; i>1; i--) {
  uint nextDigit = pos/Factorial(i-1);
  cout << digits[nextDigit];
  digits.erase(digits.begin()+nextDigit);
  pos -= nextDigit*Factorial(i-1);
}
cout << digits[0] << endl;
Note, this uses a factorial function, like one you could find here.