Scott on Writing

Musings on technical writing...

How Do You Tell if a Number is Prime?

I just finished reading a blog entry on the new System.Collections.Generics.Dictionary class in Whidbey where Krzysztof Cwalina talked about the difference in conflict resolution between the Hashtable and the Dictionary classes.  (The Hashtable uses probing, the Dictionary chaining.)  When either data structure reaches its maximum size it must be expanded.  (This maximum size is defined as a percentage of full buckets for the Hashtable, as defined by its loadFactor; for the Dictionary, all of its buckets must be filled before being resized.)  When being resized, the algorithm goes as follows:

  1. Double the size of the number of buckets
  2. Find the next largest prime number; this prime number is the new number of buckets for the data structure.

For the Hashtable, which uses probing, it is of great importance that the hash and the number of buckets be relatively prime, in order to ensure that probing does not lead to conflicts itself.  This is ensured by having the number of buckets be prime, meaning it will be relatively prime with any number, save itself and 1.  But why does Dictionary need a prime number of buckets?  There may be some reason I'm missing, but I don't believe it's a requirement.

Anywho, a commentor on Krzysztof's blog asked why the number of buckets is expanded so.  That is, why double (and then some) the number of buckets, why not increase is by 10%, or 50%, or something user-defined?  From my understanding, this could be user-defined, but doubling the size of a buffer is a good heuristic.  It's what the ArrayList and List data structures do when you exceed their capacities.

But enough rambling, let me address the title line of this blog.  When expanding a Hashtable, it is important to end up with a prime number of buckets.  But how does one find a prime number?  Since a prime number is a number with only 1 and itself as factors, a naive approach would be to loop from 2 to the number - 1, and see if any numbers divided the number being tested.  If so, the number is not prime; if not, the number is prime.

While this will definitely work, it's a bit of overkill.  You can actually loop to just the square root of the number you want to test.  Why?  Well, say you want to know if x is prime.  If it is, then a number less than or equal to sqrt(x) must divide x.  To prove this, use a proof by contradiction.  Assume that x is composite and that it is not divisible by a number less than sqrt(x).  Then there exists some set of numbers, call them y and z, such that y * z = x, and y, z > sqrt(x).  If this is the case, then y * z must be greater than x, since sqrt(x) * sqrt(x) = x.  Since we have reached an absurd conclusion, the premise must be wrong, and therefore a composite number must have a factor less than or equal to its square root.  So you can just loop from 2 to sqrt(x) when checking for divisibility.

Even this approach is overkill, really.  I believe it was Euclid who showed that all integers can be written as a product of primes.  So rather than looping from 2 to sqrt(x), you really only need to check for divisibility against the prime numbers between 2 and sqrt(x).

Enough mathematics - how does .NET find the next largest prime number?  Well, the Hashtable maintains an array of 70 prime numbers that are roughly twice in size.  That is, it starts with 11, then 17, then 23, then 29, then 37, and so on, up to 7,199,369.  Obviously it skips over a lot of primes, since the 1,000 prime number is 7,919.  Given a number, we can find the prime larger than this number by consulting this pre-defined array.  But what if the number we have to find a prime larger than is, itself, larger than 7,199,369?  If this is the case, the .NET Framework falls back on the ol' tried and true measure - looping from 2 to sqrt(x) and checking for divisibility.

To learn more about Hashtables in the .NET Framework 1.x, check out: An Extensive Examination of Data Structures, Part 2.

posted on Friday, August 06, 2004 11:31 PM

Feedback

# re: How Do You Tell if a Number is Prime? 8/9/2004 1:12 PM Tom Pester

"But why does Dictionary need a prime number of buckets? There may be some reason I'm missing, but I don't believe it's a requirement."

I think it is to counter _bad_ hashfunctions who generate recurring patterns.
If the number of buckets isnt prime then clustering will occur.

A _good_ hash function & chaining doesnt need a prime number of buckets I think.

Title:  
Name:  
Url:
Protected by Clearscreen.SharpHIPEnter the code you see:
Comments   

My Links

Ads Via DevMavens

Archives

Post Categories

 

I am a Microsoft MVP for ASP.NET.
I am an ASPInsider.
<March 2010>
SMTWTFS
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Comment Stats

DayTotal% of Total
Sunday 2056.8%
Monday 42514.1%
Tuesday 51917.2%
Wednesday 55518.4%
Thursday 58019.2%
Friday 54718.1%
Saturday 1886.2%
Total 3019100.0%

Hour1Total% of Total
12:00 AM 782.6%
1:00 AM 812.7%
2:00 AM 682.3%
3:00 AM 822.7%
4:00 AM 692.3%
5:00 AM 1264.2%
6:00 AM 1183.9%
7:00 AM 1816.0%
8:00 AM 1926.4%
9:00 AM 1585.2%
10:00 AM 1886.2%
11:00 AM 1936.4%
12:00 PM 2016.7%
1:00 PM 1846.1%
2:00 PM 1695.6%
3:00 PM 1354.5%
4:00 PM 1153.8%
5:00 PM 1073.5%
6:00 PM 1013.3%
7:00 PM 1073.5%
8:00 PM 923.0%
9:00 PM 882.9%
10:00 PM 913.0%
11:00 PM 953.1%
Total 3019100.0%

Comments by Blog Entry Date/Time

Day Entry MadeAvg.Total
Sunday 4.97159
Monday 4.80384
Tuesday 4.04477
Wednesday 7.39680
Thursday 6.26676
Friday 5.07466
Saturday 4.78177
Total 5.403019

Hour1 Entry MadeAvg.Total
12:00 AM 5.2937
1:00 AM 1.002
5:00 AM 0.000
7:00 AM 3.8550
8:00 AM 3.72134
9:00 AM 6.06297
10:00 AM 5.63276
11:00 AM 4.22194
12:00 PM 6.16351
1:00 PM 3.09133
2:00 PM 4.89230
3:00 PM 7.64321
4:00 PM 4.00108
5:00 PM 6.07170
6:00 PM 4.64116
7:00 PM 8.95188
8:00 PM 8.63164
9:00 PM 5.00115
10:00 PM 6.31101
11:00 PM 4.5732
Total 5.403019

Learn More About Comment Stats
1 - All times GMT -8...


Blog Stats

Favorite Web Sites

My Books

My MSDN Articles