Tuesday, April 9, 2013

Segmented Sieve of Eratosthenes

From my last playing around with primes post, I've finally managed to segment the sieving algorithm. This does a few different things:

  • Reduces memory usage. Now the usage is proportional to the number of primes found plus some small fixed buffer.
  • Increases locality when sieving. This should help with performance.
  • Hopefully makes parallelization easier/possible.
  • Finding the first x primes becomes a relatively straightforward and quick task.

Here's the code:

public static int[] seg_sieve(int n)
{
 if (n < 2)
 {
  return null;
 }
 int[] results = new int[(int) Math.ceil(1.25506 * n / Math.log(n))];
 results[0] = 2;
 results[1] = 3;
 int size = 1;

 // use 8kb buffer
 char[] is_composite = new char[0x1000];
 // base_shift_factor = 5
 // segment_shift_factor = log2(is_composite.length) + base_shift_factor
 final int seg_count = n - 2 >> 17;

 for (int seg = 0; seg <= seg_count; ++seg)
 {
  final int seg_start = (seg << 17) + 3;
  final int seg_end = Math.min((seg + 1 << 17) + 3, n);
  final int limit_i = (int) Math.ceil(Math.sqrt(seg_end));

  Arrays.fill(is_composite, (char) 0);

  // pre-sieve primes we've already found
  for (int i = 1; i < size && results[i] <= limit_i; ++i)
  {
   // start at max of prime squared or minimum multiple of result in range that's not
   // also a multiple of 2
   int mark_start = seg_start / results[i] * results[i];
   if ((seg_start / results[i] & 0x1) == 1)
   {
    if (mark_start < seg_start)
    {
     mark_start += 2 * results[i];
    }
   }
   else
   {
    mark_start += results[i];
   }
   for (int j = Math.max(results[i] * results[i], mark_start); j < seg_end; j += results[i] << 1)
   {
    is_composite[j - seg_start >> 5] |= 1 << ((j - seg_start & 0x1F) >> 1);
   }
  }

  // start sieving
  for (int i = seg_start; i <= limit_i; i += 2)
  {
   if ((is_composite[i - seg_start >> 5] & 1 << ((i - seg_start & 0x1F) >> 1)) == 0)
   {
    // i is prime, mark off all multiples
    results[size++] = i;
    // start at i squared
    for (int j = i * i; j < seg_end; j += i << 1)
    {
     is_composite[j - seg_start >> 5] |= 1 << ((j - seg_start & 0x1F) >> 1);
    }
   }
  }
  // gather primes past sqrt(n) and are in range
  for (int i = Math.max(limit_i + (limit_i - 1) % 2, seg_start); i < seg_end; i += 2)
  {
   if ((is_composite[i - seg_start >> 5] & 1 << ((i - seg_start & 0x1F) >> 1)) == 0)
   {
    results[size++] = i;
   }
  }
 }
 // resize array to only fit primes
 int[] primes = new int[size];
 for (int i = 0; i < size; ++i)
 {
  primes[i] = results[i];
 }
 return primes;
}

I'm using a fixed 8kb buffer for sieving. Through some testing I found that this size performed best for my setup. I haven't tested it extensively yet for correctness, but it should give correct results. Some basic benchmarking produced promising results. Here's a short summary of timings for generating primes up to 0x3FFFFFFF (~1 billion):

  • Segmented code: ~5.8 sec
  • Array-storage code: ~9.7 sec
  • Arraylist somewhat obfuscated/optimized code: ~17.3 sec
  • Naive implementation: ~63 sec

Definitely some good performance gains, but there is still some room for improvement. I'm not sure where I'm going to look for performance next, or when. I may consider multi-threading as that should be a fairly straightforward job. Alternatively, I could try another re-write and try to incorporate some other techniques like a modulo 210 wheel factorization.

No comments :

Post a Comment