## Tuesday, March 19, 2013

### More Fun with Sieving Primes

I was playing around with my prime sieving implementation and I managed to improve the performance by making some simple changes. The code looks quite a bit different, but functionally it does basically the same thing. The one big difference is that I switched the code to use arrays instead of ArrayLists. Through some testing I realized that there was a fairly significant performance difference between arrays and ArrayLists. The actual problem has to deal with Java generics and boxing of primitives. I may go more into the details at a later date. It's a rather specialized optimization and for the vast majority of programs shouldn't be necessary. Other than that, the code is generally more readable.

Here's the code:

/**
* Gets the list of all primes less than or equal to n. Uses a slight modification of the Seive of
* Eratosthenes
*
* @param n
* @return array of primes, or null if none
*/
public static int[] sieve(int n)
{
if (n < 2)
{
// no primes less than 2
return null;
}
char[] is_composite = new char[(n - 2 >> 5) + 1];
final int upper_limit = (is_composite.length << 5) + 1;
int limit_i = (int) Math.ceil(Math.sqrt(upper_limit));
int[] results = new int[(int) Math.ceil(1.25506 * n / Math.log(n))];
results[0] = 2;
int size = 1;

// i is the exact number
for (int i = 3; i < limit_i; i += 2)
{
if ((is_composite[i - 3 >> 5] & 1 << ((i - 3 & 0x1F) >> 1)) == 0)
{
// i is prime, mark off all multiples
results[size++] = i;
// start at i squared
for (int j = i * i; j < n; j += i << 1)
{
is_composite[j - 3 >> 5] |= 1 << ((j - 3 & 0x1F) >> 1);
}
}
}
// gather primes past sqrt(n)
for (int i = limit_i + 1; i <= n; i += 2)
{
if ((is_composite[i - 3 >> 5] & 1 << ((i - 3 & 0x1F) >> 1)) == 0)
{
results[size++] = i;
}
}
// resize array to only fit primes
int[] primes = new int[size];
for (int i = 0; i < size; ++i)
{
primes[i] = results[i];
}
return primes;
}

Benchmarking this code, I was able to generate primes up to 0x40000000 (~1 billion) in about 10-11 seconds, compared to about 17-18 seconds with the old code. Still not as good as primesieve, but a nice substantial improvement. I also ported the code to C++ and the performance was almost identical. There was ~5-10% difference in favor of C++, but not really significant and within the timing deviation. In the future I want to see what difference tiling and taking advantage of locality will provide. This code should make this task much easier as it's logically laid out and variables typically store actual logical values rather than intermediate values.