Helloworld922's Blog: Segmented Sieve of Eratosthenes

From my last playing around with primes post, I've finally managed to segment the sieving algorithm. This does a few different things:

Reduces memory usage. Now the usage is proportional to the number of primes found plus some small fixed buffer.
Increases locality when sieving. This should help with performance.
Hopefully makes parallelization easier/possible.
Finding the first x primes becomes a relatively straightforward and quick task.

Here's the code:

public static int[] seg_sieve(int n)
{
 if (n < 2)
 {
  return null;
 }
 int[] results = new int[(int) Math.ceil(1.25506 * n / Math.log(n))];
 results[0] = 2;
 results[1] = 3;
 int size = 1;

 // use 8kb buffer
 char[] is_composite = new char[0x1000];
 // base_shift_factor = 5
 // segment_shift_factor = log2(is_composite.length) + base_shift_factor
 final int seg_count = n - 2 >> 17;

 for (int seg = 0; seg <= seg_count; ++seg)
 {
  final int seg_start = (seg << 17) + 3;
  final int seg_end = Math.min((seg + 1 << 17) + 3, n);
  final int limit_i = (int) Math.ceil(Math.sqrt(seg_end));

  Arrays.fill(is_composite, (char) 0);

  // pre-sieve primes we've already found
  for (int i = 1; i < size && results[i] <= limit_i; ++i)
  {
   // start at max of prime squared or minimum multiple of result in range that's not
   // also a multiple of 2
   int mark_start = seg_start / results[i] * results[i];
   if ((seg_start / results[i] & 0x1) == 1)
   {
    if (mark_start < seg_start)
    {
     mark_start += 2 * results[i];
    }
   }
   else
   {
    mark_start += results[i];
   }
   for (int j = Math.max(results[i] * results[i], mark_start); j < seg_end; j += results[i] << 1)
   {
    is_composite[j - seg_start >> 5] |= 1 << ((j - seg_start & 0x1F) >> 1);
   }
  }

  // start sieving
  for (int i = seg_start; i <= limit_i; i += 2)
  {
   if ((is_composite[i - seg_start >> 5] & 1 << ((i - seg_start & 0x1F) >> 1)) == 0)
   {
    // i is prime, mark off all multiples
    results[size++] = i;
    // start at i squared
    for (int j = i * i; j < seg_end; j += i << 1)
    {
     is_composite[j - seg_start >> 5] |= 1 << ((j - seg_start & 0x1F) >> 1);
    }
   }
  }
  // gather primes past sqrt(n) and are in range
  for (int i = Math.max(limit_i + (limit_i - 1) % 2, seg_start); i < seg_end; i += 2)
  {
   if ((is_composite[i - seg_start >> 5] & 1 << ((i - seg_start & 0x1F) >> 1)) == 0)
   {
    results[size++] = i;
   }
  }
 }
 // resize array to only fit primes
 int[] primes = new int[size];
 for (int i = 0; i < size; ++i)
 {
  primes[i] = results[i];
 }
 return primes;
}

I'm using a fixed 8kb buffer for sieving. Through some testing I found that this size performed best for my setup. I haven't tested it extensively yet for correctness, but it should give correct results. Some basic benchmarking produced promising results. Here's a short summary of timings for generating primes up to 0x3FFFFFFF (~1 billion):

Segmented code: ~5.8 sec
Array-storage code: ~9.7 sec
Arraylist somewhat obfuscated/optimized code: ~17.3 sec
Naive implementation: ~63 sec

Definitely some good performance gains, but there is still some room for improvement. I'm not sure where I'm going to look for performance next, or when. I may consider multi-threading as that should be a fairly straightforward job. Alternatively, I could try another re-write and try to incorporate some other techniques like a modulo 210 wheel factorization.

Helloworld922's Blog

Pages

Tuesday, April 9, 2013

Segmented Sieve of Eratosthenes

No comments :

Post a Comment