Monday 31 March 2014

Haskell Prime Number Generator (part 2 of 3)

In my previous post I detailed the following elegant but inefficient Haskell prime number generator:

primes (x:xs) = x : primes (filter ((/= 0) . (`mod` x)) xs)

This algorithm was taken from user gauchopuro in the Project Euler Problem 7 forum. In this post I'm going to look at a prime number generator also submitted in that forum.

Again, as I'm currently learning Haskell, I'm going to explain this function at the beginner level.


Prime Number Generator from user vineet:

isPrime x = isPrimeHelper x primes

isPrimeHelper x (p:ps)
        | p*p > x        = True
        | x `mod` p == 0 = False
        | otherwise      = isPrimeHelper x ps

primes = 2 : filter isPrime [3..]

This prime number generator is a bit longer than the previous one. It doesn't use a sieve, and it involves three functions, each of which, as I will show, is quite simple. The above code uses recursion, pattern matching, the filter function, the cons operator, and lazy evaluation, all of which are explained in the previous post. See that post for more details.

In addition, to parse the above code, you'll need to understand Haskell guards. The isPrimeHelper function is defined for three different conditions using the guard notation '|'. If the predicate after the guard evaluates to True then the function definition to the right of the corresponding equals is used. If more than one predicate evaluates to True only the first one is used. So, in the above example, if p*p > 0 then isPrimeHelper will return True. If p*p > 0 is False, then x `mod` p == 0 is evaluated next. If that's True then isPrimeHelper returns False. Otherwise isPrimeHelper returns the result of isPrimeHelper x ps.

Now to parse the above code. Firstly consider that the list of prime numbers is defined as:

primes = 2 : filter isPrime [3..]

This line is quite simple. It says that the list of prime numbers starts with 2, and is followed by all the numbers from 3 to infinity that return True when passed to isPrime. The isPrime function, again, looks like this:

isPrime x = isPrimeHelper x primes

This function uses the final component of this prime number generator, isPrimeHelper, to determine if the argument x is prime or not. Notice that it also uses primes which we've defined as the list of prime numbers. The list primes is infinite and so it's never complete, and you may think this would be a problem. However, as will be explained, this works thanks to Haskell's lazy evaluation and the way isPrimeHelper is written and called. Here's isPrimeHelper again:

isPrimeHelper x (p:ps)
        | p*p > x        = True
        | x `mod` p == 0 = False
        | otherwise      = isPrimeHelper x ps

Before explaining this code, here is an explanation of how the algorithm for isPrimeHelper works. It takes a number to check, x, and a list of primes (p:ps). If any number in the list of primes divides evenly into x then x is not prime, and the function returns False. The primes are checked one by one beginning with the smallest. If the algorithm reaches a primes that is greater than the square root of x then there's no need to check any further as x is prime (because any number x can have only one prime factor greater than the square root of x, and in the case of a prime number that factor is x itself).

The code for isPrimeHelper works as follows. The function uses guards (explained earlier). The first element from the list of primes passed in is p. If p*p > x, then p is greater than the square root of x, so x is a prime number and the function returns True. Otherwise the function checks if p is a factor of x. If so then x is not prime and isPrimeHelper returns False. Finally, if p isn't a factor of x then isPrimeHelper calls itself recursively with p being removed from the front of the list of primes.

Remember that the list of primes passed into isPrimeHelper initially has just the single element 2, and this list is extended as the algorithm runs. Due to lazy evaluation isPrimeHelper only requires the elements one at a time until it returns, and it never runs out of elements because one the conditions that lead to the function returning will always occur before the end of the list is reached.

This prime number generator is much faster that the previous one. Although it's a bit longer, each of the three functions it uses is quite simple to understand. In another post I'll show how to make this function a bit more performant, and in my opinion a bit more readable.