From my last playing around with primes post, I've finally managed to segment the sieving algorithm. This does a few different things:
- Reduces memory usage. Now the usage is proportional to the number of primes found plus some small fixed buffer.
- Increases locality when sieving. This should help with performance.
- Hopefully makes parallelization easier/possible.
- Finding the first x primes becomes a relatively straightforward and quick task.
Here's the code:
public static int[] seg_sieve(int n) { if (n < 2) { return null; } int[] results = new int[(int) Math.ceil(1.25506 * n / Math.log(n))]; results[0] = 2; results[1] = 3; int size = 1; // use 8kb buffer char[] is_composite = new char[0x1000]; // base_shift_factor = 5 // segment_shift_factor = log2(is_composite.length) + base_shift_factor final int seg_count = n - 2 >> 17; for (int seg = 0; seg <= seg_count; ++seg) { final int seg_start = (seg << 17) + 3; final int seg_end = Math.min((seg + 1 << 17) + 3, n); final int limit_i = (int) Math.ceil(Math.sqrt(seg_end)); Arrays.fill(is_composite, (char) 0); // pre-sieve primes we've already found for (int i = 1; i < size && results[i] <= limit_i; ++i) { // start at max of prime squared or minimum multiple of result in range that's not // also a multiple of 2 int mark_start = seg_start / results[i] * results[i]; if ((seg_start / results[i] & 0x1) == 1) { if (mark_start < seg_start) { mark_start += 2 * results[i]; } } else { mark_start += results[i]; } for (int j = Math.max(results[i] * results[i], mark_start); j < seg_end; j += results[i] << 1) { is_composite[j - seg_start >> 5] |= 1 << ((j - seg_start & 0x1F) >> 1); } } // start sieving for (int i = seg_start; i <= limit_i; i += 2) { if ((is_composite[i - seg_start >> 5] & 1 << ((i - seg_start & 0x1F) >> 1)) == 0) { // i is prime, mark off all multiples results[size++] = i; // start at i squared for (int j = i * i; j < seg_end; j += i << 1) { is_composite[j - seg_start >> 5] |= 1 << ((j - seg_start & 0x1F) >> 1); } } } // gather primes past sqrt(n) and are in range for (int i = Math.max(limit_i + (limit_i - 1) % 2, seg_start); i < seg_end; i += 2) { if ((is_composite[i - seg_start >> 5] & 1 << ((i - seg_start & 0x1F) >> 1)) == 0) { results[size++] = i; } } } // resize array to only fit primes int[] primes = new int[size]; for (int i = 0; i < size; ++i) { primes[i] = results[i]; } return primes; }
I'm using a fixed 8kb buffer for sieving. Through some testing I found that this size performed best for my setup. I haven't tested it extensively yet for correctness, but it should give correct results. Some basic benchmarking produced promising results. Here's a short summary of timings for generating primes up to 0x3FFFFFFF (~1 billion):
- Segmented code: ~5.8 sec
- Array-storage code: ~9.7 sec
- Arraylist somewhat obfuscated/optimized code: ~17.3 sec
- Naive implementation: ~63 sec
Definitely some good performance gains, but there is still some room for improvement. I'm not sure where I'm going to look for performance next, or when. I may consider multi-threading as that should be a fairly straightforward job. Alternatively, I could try another re-write and try to incorporate some other techniques like a modulo 210 wheel factorization.
No comments :
Post a Comment