Scott on Writing

Musings on technical writing...

How Do You Tell if a Number is Prime?

I just finished reading a blog entry on the new System.Collections.Generics.Dictionary class in Whidbey where Krzysztof Cwalina talked about the difference in conflict resolution between the Hashtable and the Dictionary classes.  (The Hashtable uses probing, the Dictionary chaining.)  When either data structure reaches its maximum size it must be expanded.  (This maximum size is defined as a percentage of full buckets for the Hashtable, as defined by its loadFactor; for the Dictionary, all of its buckets must be filled before being resized.)  When being resized, the algorithm goes as follows:

  1. Double the size of the number of buckets
  2. Find the next largest prime number; this prime number is the new number of buckets for the data structure.

For the Hashtable, which uses probing, it is of great importance that the hash and the number of buckets be relatively prime, in order to ensure that probing does not lead to conflicts itself.  This is ensured by having the number of buckets be prime, meaning it will be relatively prime with any number, save itself and 1.  But why does Dictionary need a prime number of buckets?  There may be some reason I'm missing, but I don't believe it's a requirement.

Anywho, a commentor on Krzysztof's blog asked why the number of buckets is expanded so.  That is, why double (and then some) the number of buckets, why not increase is by 10%, or 50%, or something user-defined?  From my understanding, this could be user-defined, but doubling the size of a buffer is a good heuristic.  It's what the ArrayList and List data structures do when you exceed their capacities.

But enough rambling, let me address the title line of this blog.  When expanding a Hashtable, it is important to end up with a prime number of buckets.  But how does one find a prime number?  Since a prime number is a number with only 1 and itself as factors, a naive approach would be to loop from 2 to the number - 1, and see if any numbers divided the number being tested.  If so, the number is not prime; if not, the number is prime.

While this will definitely work, it's a bit of overkill.  You can actually loop to just the square root of the number you want to test.  Why?  Well, say you want to know if x is prime.  If it is, then a number less than or equal to sqrt(x) must divide x.  To prove this, use a proof by contradiction.  Assume that x is composite and that it is not divisible by a number less than sqrt(x).  Then there exists some set of numbers, call them y and z, such that y * z = x, and y, z > sqrt(x).  If this is the case, then y * z must be greater than x, since sqrt(x) * sqrt(x) = x.  Since we have reached an absurd conclusion, the premise must be wrong, and therefore a composite number must have a factor less than or equal to its square root.  So you can just loop from 2 to sqrt(x) when checking for divisibility.

Even this approach is overkill, really.  I believe it was Euclid who showed that all integers can be written as a product of primes.  So rather than looping from 2 to sqrt(x), you really only need to check for divisibility against the prime numbers between 2 and sqrt(x).

Enough mathematics - how does .NET find the next largest prime number?  Well, the Hashtable maintains an array of 70 prime numbers that are roughly twice in size.  That is, it starts with 11, then 17, then 23, then 29, then 37, and so on, up to 7,199,369.  Obviously it skips over a lot of primes, since the 1,000 prime number is 7,919.  Given a number, we can find the prime larger than this number by consulting this pre-defined array.  But what if the number we have to find a prime larger than is, itself, larger than 7,199,369?  If this is the case, the .NET Framework falls back on the ol' tried and true measure - looping from 2 to sqrt(x) and checking for divisibility.

To learn more about Hashtables in the .NET Framework 1.x, check out: An Extensive Examination of Data Structures, Part 2.

posted on Friday, August 06, 2004 11:31 PM

Feedback

# re: How Do You Tell if a Number is Prime? 8/9/2004 1:12 PM Tom Pester

"But why does Dictionary need a prime number of buckets? There may be some reason I'm missing, but I don't believe it's a requirement."

I think it is to counter _bad_ hashfunctions who generate recurring patterns.
If the number of buckets isnt prime then clustering will occur.

A _good_ hash function & chaining doesnt need a prime number of buckets I think.

Title:  
Name:  
Url:
Protected by Clearscreen.SharpHIPEnter the code you see:
Comments   

Add To Your Reader

My Links

Archives

Post Categories

 

I am a Microsoft MVP for ASP.NET.
I am an ASPInsider.
<May 2008>
SMTWTFS
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567

Comment Stats

DayTotal% of Total
Sunday 1866.8%
Monday 37913.9%
Tuesday 45316.7%
Wednesday 50418.5%
Thursday 53519.7%
Friday 49418.2%
Saturday 1666.1%
Total 2717100.0%

Hour1Total% of Total
12:00 AM 652.4%
1:00 AM 682.5%
2:00 AM 622.3%
3:00 AM 742.7%
4:00 AM 572.1%
5:00 AM 1033.8%
6:00 AM 1084.0%
7:00 AM 1585.8%
8:00 AM 1716.3%
9:00 AM 1475.4%
10:00 AM 1716.3%
11:00 AM 1816.7%
12:00 PM 1886.9%
1:00 PM 1696.2%
2:00 PM 1605.9%
3:00 PM 1324.9%
4:00 PM 1073.9%
5:00 PM 923.4%
6:00 PM 913.3%
7:00 PM 963.5%
8:00 PM 833.1%
9:00 PM 782.9%
10:00 PM 792.9%
11:00 PM 772.8%
Total 2717100.0%

Comments by Blog Entry Date/Time

Day Entry MadeAvg.Total
Sunday 5.54144
Monday 5.22339
Tuesday 4.28419
Wednesday 7.67637
Thursday 6.90607
Friday 5.48411
Saturday 5.33160
Total 5.842717

Hour1 Entry MadeAvg.Total
12:00 AM 5.0035
1:00 AM 1.002
5:00 AM 0.000
7:00 AM 7.0035
8:00 AM 5.35107
9:00 AM 6.32278
10:00 AM 6.47246
11:00 AM 4.41181
12:00 PM 6.88330
1:00 PM 3.00111
2:00 PM 5.41222
3:00 PM 8.64285
4:00 PM 4.0589
5:00 PM 5.92154
6:00 PM 4.52113
7:00 PM 9.67174
8:00 PM 9.80147
9:00 PM 5.05111
10:00 PM 5.4265
11:00 PM 4.5732
Total 5.842717

Learn More About Comment Stats
1 - All times GMT -8...


Blog Stats

Favorite Web Sites

My Books

My MSDN Articles