April 2005 - Posts

Need a GMail Invite? Drop Me a Line!
28 April 05 03:07 PM | Scott Mitchell

Earlier this week I posted a blog entry about using GMail as the data store for an online knowledge base. At the end of that blog, I wrote:

BTW, if you need a GMail account, feel free to drop me a line, I'd be happy to hook you up.

I was a bit surprised that a dozen or so folks emailed and asked for GMail invites. While GMail is still technically “invite only,” a few months ago they raised the invite queue from a measely 5 invites to 50. Assuming just one GMail user sent out all 50 invites, and each of those 50 recipients sent out 50 invites, and so on, it would only take six such iterations before the entire world's population had a GMail account. (Although even with just 5 invites, it would take less than 15 such iterations to ensure everyone alive today had a GMail account.) I figured most everyone who wanted one would have one by now, but I guess that's not the case. So... here's a more public announcement - if you need a GMail invite, let me know, and I'll be happy to send one your way.

Filed under:
Why I Don't Use DataSets in My ASP.NET Applications
27 April 05 07:01 PM | Scott Mitchell

A couple weeks ago (April 19th) I gave a talk to the local San Diego ASP.NET SIG and during my talk I mentioned how I, personally, rarely, if ever, use DataSets in my ASP.NET applications, sticking with the good ol' DataReader instead. Since then I have received a number of emails from attendees asking me why I don't use DataSets. Rather than responding to each questioner individually, I decided to write an article explaining my rationale.

This article is currently being worked on over at 4Guys, and can be accessed at http://aspnet.4guysfromrolla.com/articles/050405-1.aspx. I'm posting this to my blog because I'm interested in hearing any comments/questions/feedback regarding the article or claims made within before going “live.”

Do you use DataSets in your ASP.NET application? If so, why? Read the article and then tell my why you use DataSets, please. I'm looking for some good, real-world reason as to why you should be using a DataSet. The only answers I've ever come up with is (a) because it's easier sometimes, and (b) a client wants a Web application that behaves precisely like a standard data-entry desktop application, where the user can make a bevy of changes, but no changes are committed to the database until they click some sort of "Update" button.

Thanks!

Filed under:
Using Email as a Knowledge Base
25 April 05 09:42 PM | Scott Mitchell

A recent BBC article, E-mail is the New Database, examines how as the storage capacity and search capabilities of email services have grown and improved over the past year, many folks are using their online email accounts as personal databases. Need to remember and be able to find that colleague's phone number from anywhere in the world? Email his contact information to your GMail account. Want to be able to puruse your TODO list at work? Email it to your GMail account before you leave home in the morning.

Anywho, this article got me to thinking - why not use GMail as an online knowledge base? I've already blogged before about the usefulness of GMail for managing high-volume listservs - with its threaded email views, virtually unlimited storage space, and killer search features it's makes email listservs enjoyable to use again. So why not create a GMail account that does nothing else but serve as a repository for a gaggle of focused email listservs? Over time this GMail account would automatically be populated with user's questions and (more importantly) answers to these questions. When facing a particularly tough problem, the first stop - before searching the web, would be to logon and search this GMail account. It would be like searching a small corner of the web that was known to be highly focused on,.say, server-side development using Microsoft technologies.

On Sunday I created a new GMail account and started signing up to ASP.NET-related listservs. There are a number over at ASPAdvice.com along with a number over at Yahoo! Groups. What's also neat is that the ASP.NET Forums allow you to receive an email whenever a new post is made to a specified forum. Similarly, you can configure Google Groups to send you a daily digest from a particular USENET newsgroup (such as microsoft.public.dotnet.framework.aspnet).

With these data sources alone, I've already amassed 417 “conversations” from around the web, highly focused on ASP.NET development. (These numbers boil down to an estimated 12 new conversations per hour; meaning in a month, there'd be roughly 8,000 conversations, or 86,000 in a year. The nice thing is that this archive can be quickly searched just as well as the web is searched through Google and space will (likely) never be an issue.). I've yet to need to search this GMail account for technical help, but I am expecting/hoping to find it to be: (a) easy to search, and (b) more relevant than a general web search via Google, especially since many of these resources (especially listservs) are not archived on a website, and therefore not indexed by Google's spiders. (I'll keep you posted as to how useful such a GMail account turns out to be...)

Another neat use for GMail! BTW, if you need a GMail account, feel free to drop me a line, I'd be happy to hook you up.

DevConnections 2005 Slides
20 April 05 08:48 AM | Scott Mitchell

As I had blogged about earlier, I spoke at the DevConnections conference in March of this year. A number of folks have emailed me asking for the slides/code I presented, so here it is: http://datawebcontrols.com/classes/ASPNETConnections2005.zip

I presented three sessions during this four-day conference:

  • Working with Client-Side Script - Examined techniques for injecting client-side script from server-side code. Examined a base page class that provided methods for accomplishing common client-side tasks. Looked at upcoming ASP.NET 2.0 features for working with client-side script.
  • Syndicating and Consuming RSS Content - Examined the RSS standard and the role of syndication. Examined techniques in both ASP.NET 1.x and ASP.NET 2.0 for syndicating and consuming RSS content. Showcased the open-source ASP.NET control RssFeed.
  • Working with HTTP Handlers and Modules - Examined the HTTP pipeline for ASP.NET pages, looking at how to serve custom types of content with HTTP Handlers and how to respond to request-level events using HTTP Modules. Dissected two live demos: an HTTP Handler for color-coding code snippets and an HTTP Module for logging unhandled exceptions.

Enjoy!

Filed under:
onbeforeunload and eval Problems in IE
19 April 05 11:11 AM | Scott Mitchell

Just wrapped up this week's article for 4Guys which takes a look at some common questions/problems readers have alerted me to regarding a past article of mine, Using ASP.NET to Prompt a User to Save When Leaving a Page. That previous article examined how to use a the onbeforeunload client-side event along with some JavaScript to determine if a user was leaving a page after having modified some data-entry input fields but before having saved the changes. If the user attempted to leave the page without having saved their changes - either by clicking on a link in the page, attempting to close the browser, having the browser “hijacked“ by clicking on a link in another program, etc. - a prompt like the one shown below would magically appear, warning the user that they were about to leave the page without having saved their latest changes, letting them cancel leaving the page.

The previous article examined how to wrap up this client-side functionality into an ASP.NET base class that included methods to indicate what Web controls needed to be monitored for changes and what Button controls, when clicked, shouldn't trigger the client-side check (buttons like Save and Cancel shouldn't prompt the user, for example).

Over the past several months I have gotten dozens of emails from readers regarding that particular article. The most common has been having the prompt displayed when a Web control with its AutoPostback property set to True was changed. A fix for this is discussed in this week's article (An Update on Prompting a User to Save When Leaving an ASP.NET Page).

The other common question that I received was along the lines of, “I'm using a menu control and when I use the menu control to leave a page the prompt appears, as expected. However, when I click Cancel, to stay on the page, I see an 'Unspecified error' script error. Help!” I actually wasn't able to help much until recently, when by fortuitous circumstance I happened to be working on a project that used telerik's r.a.d. menu. This menu control exhibited the “Unspecified error” script errors others had notified me of when using onbeforeunload, and with a bit of investigation I found out why - r.a.d. menu uses eval() statements to redirect users to a different menu choice, and (for some reason) Internet Explorer doesn't like onbeforeunload and eval() statements.

In the end, telerik provided a workaround, although it's a bit of a cheeky workaround. I ended up having to surround the eval() statements with an empty try...catch block, like:

try { eval(...); } catch (blah) {}

That suppressed the script error messages in IE. (FireFox didn't freak out over this in the first place...) For more info, you can see a forum entry I made on this topic.

Filed under:
Talk Tonight on the Enterprise Library's Data Access Application Block
19 April 05 08:04 AM | Scott Mitchell

I'm giving a talk tonight at the ASP.NET Special Interest Group of the San Diego .NET User Group on the Enterprise Library, focusing in on the Data Access Application Block (DAAB). The talk runs from 6:30 pm to 8:00 pm at the Microsoft Office in UTC. Free pizza starting at 6:00 pm!

Here's a synopsis of the talk I'll be giving:

One of Microsoft's efforts over the past couple of years has been to provide developers with useful code libraries that illustrate best practices. To achieve this goal the Patterns and Practices Group has been tasked with developing numerous application blocks, which are open-source libraries aimed at solving common tasks. The aim of the application blocks is to reduce development cost and increase confidence. Costs are reduced because integrating the application blocks into a project saves the development time that would otherwise be required to build the functionality, and confidence in the application is increased because the application blocks are well tested and have been used by thousands of developers around the world, meaning any bugs are likely to have been discovered and squashed.

In January 2005 the Patterns and Practices Group released the Enterprise Library, a collection of seven application blocks that share a common design and code base. One of the most used application blocks is the Data Access Application Block (DAAB), which simplifies data access. This talk will provide an overview of the Enterprise Library along with an examination of using the Data Access Application Block in an ASP.NET Web application.

If you can't make the talk but are interested in the topic, check out An Introduction to Microsoft's Enterprise Library and Working with the Enterprise Library's Data Access Application Block. There's also a ton of great webcasts on EntLib as well.

Filed under:
It's Official: Beta 2 is Here
18 April 05 08:11 PM | Scott Mitchell

Microsoft has now officially released Beta 2 of the .NET Framework 2.0. Late last week the bits were available to MSDN Subscribers; they now are available to the masses. It appears that the Express Versions of Visual Studio can be downloaded by all, but for the full blown Visual Studio 2005 Beta 2 or SQL Server 2005 Beta 2 you need to be mailed the CDs/DVDs (if you are not an MSDN Subscriber).

As with Beta 2 of the .NET Framework 1.0, 2.0's Beta 2 includes a Go Live license, meaning that you can start developing real-world, live applications using the beta software (something that was prohibited with Beta 1). Microsoft even provides a (incomplete) list of Web host providers that support the Whidbey Beta 2.

Filed under:
Using a Base Class to Fiddle with a Page's Rendered Output
14 April 05 03:02 PM | Scott Mitchell

Imagine that you wanted to tweak a page's rendered output in some manner - perhaps inject a copyright notice at the bottom of the page or ensure that <link> tags to required stylesheets were present. There are a number of options available to make this happen in ASP.NET.:

  1. The simplest (but least tenable) approach is to simply add the logic to each and every page that requires it. This can be a maintainence nightmare, however, as if the logic changes you need to revisit every page that has this logic hard coded. Similarly, adding new pages to the site that utilize this same logic requires copying and pasting code.
  2. A more centralized approach would be to tap into an application event, if available. For example, this might be a good choice if you wanted to use some kind of special encryption or compression on the page's output.
  3. While the Global.asax approach is better than adding the logic to each individual page, it still has the problem of tightly coupling the logic with the application. If you wanted to replicate the logic in another Web application, for instance, you'd need to replicate the code. This takes us back to our initial problem - what happens if the logic needs to change or be updated? What happens when we want to add another Web application that uses this logic? We'd need to replicate the Global.asax code in each of these applications. Boo.

    A better, more losely coupled approach is to use an HTTP Module. It has access to the same application-level events, and can be packaged up as a stand-alone assembly that can be added or removed to Web applications as easily as you can add or remove a file from a directory. No recompilation/redeployment needed. (See Using HTTP Modules and Handlers to Create Pluggable ASP.NET Components for more info on using an HTTP Modules in this manner, along with an examination of a global error logging component, ELMAH.)
  4. A final approach is to use a base class that the ASP.NET pages in your application extend. The logic necessary can be placed in this base class. This technique is used in DotNetNuke (well, it was in version 2.x, I haven't poked around version 3 yet). DotNetNuke uses a base class to move the __VIEWSTATE hidden form field to the bottom of the <form> so as not to gunk up search engines or Google AdSense. Here's the base class, some code has been removed for brevity:

Public Class BasePage
Inherits System.Web.UI.Page

'
' This method overrides the Render() method for the page and moves the ViewState
' from its default location at the top of the page to the bottom of the page. This
' results in better search engine spidering.
'
Protected Overrides Sub Render(ByVal writer As System.Web.UI.HtmlTextWriter)
Dim stringWriter As System.IO.StringWriter = New System.IO.StringWriter
Dim htmlWriter As HtmlTextWriter = New HtmlTextWriter(stringWriter)
MyBase.Render(htmlWriter)
Dim html As String = stringWriter.ToString()
Dim StartPoint As Integer = html.IndexOf("<input type=""hidden"" name=""__VIEWSTATE""")
If StartPoint >= 0 Then 'does __VIEWSTATE exist?
Dim EndPoint As Integer = html.IndexOf("/>", StartPoint) + 2
Dim ViewStateInput As String = html.Substring(StartPoint, EndPoint - StartPoint)
html = html.Remove(StartPoint, EndPoint - StartPoint)
Dim FormEndStart As Integer = html.IndexOf("</form>") - 1
If FormEndStart >= 0 Then
html = html.Insert(FormEndStart, ViewStateInput)
End If
End If
writer.Write(html)
End Sub
End Class 'BasePage

For more information check out the DotNetNuke source or my latest 4Guys article, Using a Custom Base Class for your ASP.NET Page's Code-Behind Classes.

Filed under:
Comment Spam - Scripts or Brute Force?
14 April 05 11:49 AM | Scott Mitchell

I've always assumed that comment spammers are using scripts to spread their evil, evil comment spam. My assumption is based on the following:

  1. Brute force comment spamming - actually visiting the site and entering a comment in by hand - is slow and inefficient.
  2. I personally know many bloggers who use CAPTCHAs on their site but leave commentAPI wide open, and their comment spam has plummeted to near zero. CAPTCHAs, though, are no biggie if you are brute forcing the comment spam entry, so if CAPTCHAs are stopping people it must be because of screen-scraping type scripts. (However, you'd think that it wouldn't be long before the bad guys smartened up and started using commentAPI to inject their spam.)

However, I am certain that a sizable percentage of comment spam is injected through brute force means. Some poor slob taking time out of his life to visit a blog and post a comment in the hopes of improving his site's pagerank. And some of these comment spams are getting more clever, addressing other comments so as to appear valid, but hiding the spammy URL in the author's name portion. For example, today a comment was added to my last blog entry by a Mr. Stephen Bauer, MD, who happens to be a noted asperger specialist. Why he was commenting on my blog, I'm not sure, but his comment was definitely on topic. He said:

I agree with "haacked". This topic cannot be stressed enough in today everchanging, fast-moving times. Andrew was dead-on with his MSDN example. That has hit me many times with them. Other culprits are the various "ASP" websites out there that change their URLs.

Keep it real. Err, keep it the same!

The problem (other than the fact the name being used is clearly a fake)? The URL linked to from Stephen Bauer, MD points to a linkfarm site. This is an example of comment spam. In fact, I'd wager the last line - “Keep it real. Err, keep it the same!” is a marker of sorts, that this spammer can use at a later date to see if I allowed such comment spam entries to exist on ScottOnWriting.NET.

I detest comment spammers more so than email spammers. Sure, the volume with email spam is astronomically higher since the spammers have perfected their email spamming trade, but in the same token, the anti-spam tools for email have caught up as well - SpamBayes automatically keeps several thousand spam emails per month out of my Inbox. The comment spam will get worse as the spammers perfect their trade, I'm sure, but hopefully we'll see a similar rise in comment spam-fighting tools.

Filed under:
Please Don't Forget - URLs are a Public Interface!
12 April 05 06:23 PM | Scott Mitchell

When designing software that can be consumed by third-party applications, one rule is very important: public interfaces should not introduce breaking changes, ever. Never ever ever, not in a million years ever. If you release an upgrade with breaking changes to the public interface, those third-party apps that relied on your published interface will fail, and that will piss off two groups of people:

  1. The creators of the third-party applications, and
  2. The users using the third-party applications

I put the third-party application creators ahead of the users because they will be really upset because those peeved users will be peeved at the third-party application, even though the third-party application was just abiding by the documented interface.

This all seems like common sense, no? In COM development back in the days, that's all developers heard - if you must change the interface, you need to reversion. Ditto for Web services developers today. What is a bit baffling is why people don't treat website URLs are public interfaces that may be consumed by third-party applications, because that's precisely what they are. The third-party applications are the other Web pages that link to the URL.

There's never any justifiable reason for any website to have a URL that was once existing ever return a 404; a URL that does this is breaking the public interface, a contract implicitly signed by the website creator when he or she created the Web page. I don't care if you rearchitected your entire site; I don't care if it's a URL for a product you no longer sell, in that case display a page kindly explaining that the product is no longer for sale, providing links to similar products/categories you do sell, or -gasp- a link to a site that sells the product. A contract is a contract is a contract. The only excuse for a URL's death is if the company running the website goes out of business, and even then it's a weak excuse.

When you let a URL die, its death ripples to all sites around the world that link to the URL. Users visiting those sites will no experience broken links, and blame not you, Mr. “I Don't Abide By My Public Interface,” but the site that linked to you expecting you to uphold your end of the bargain. Do you know how difficult it is to fix broken links? Sure, if you only have a few dozen broken links, no biggie. But what happens when, say, you have thousands of Web pages with links to a site like, say, MSDN, and then one day MSDN decided to rearchitect the site and all those links that used to work no longer work? What do you do then when you start getting a torrent of emails saying, “These links are broken, what's wrong with you?” You start fixing them, naturally, one at a time, but that is painful, slow, prone to error, and totally unnecessary. (I use MSDN as an example because this is precisely what happened back two years ago or so. I wish I was kidding.)

Yes, I know there are technological solutions one could utilize to aid in finding broken links quickly, but the point remains: it's work/time/effort/energy that shouldn't need to be done in the first place! And, sure, there are other approaches one could take when linking to others' sites, such as by adding a layer of indirection. For example, rather than linking directly to an off-site URL, link to a redirect page on site, like /redir.aspx?ID=x, where x is some ID field in a database table that ties the link to the URL. That way, if there are multiple links to a single now-defunct URL across many pages, all pages can be updated by updating the appropriate database record. While indirection has some nice side benefits - link click tracking, for example - it runs counter to the notion of the World Wide Web, in my opinion. Plus, search engines might not pick up on the link or give it a proper context.

And you know, even if you do rearchitect your site, you can still save those old URLs. It's called URL rewriting, and can be done in a number of ways, from ISAPI filters to 404 handlers that do an automatic redirect to the new URL. Yes, this could result in a good deal of work in the rearchitecting of a large site, but that's what's to be expected when you have such a large public interface. And regardless of how long it takes, that time and effort will pale in comparison to the energy required to fixing broken links from all of those linking to the site.

Ok, now that I got all that out, I feel better. :-)

Filed under:
Programmatically Removing a Page from the OutputCache
10 April 05 10:37 PM | Scott Mitchell

One of ASP.NET's major improvements over classic ASP is its powerful caching API, which, if rightly utilized, can lead to dramatic performance gains in a Web application. For example, in ASP.NET Micro-Caching: Benefits of a One-Second Cache, author Steve Smith talks about the advantages of using output caching for one-second intervals. If you've ever heard Rob Howard speak you'll no doubt have heard his points on the benefits of caching.

The ASP.NET data cache provides a number of methods to add, inspect, and remove items from the cache, allowing one to build a “caching administration page,” where a site admin can, in real-time, inspect the items in the Web application's caching and opt to clear certain cached items. In fact, if you are going to be in Southern California in early May you can listen to Scott Cate talk about such an application:

What’s in your Cache? - Scott Cate
Do you use the cache for ASP.NET ? If so, do you know what’s in it? How do you remove an object from cache? Sure you can do it programmatically, but what if you had an admin interface to view your items in the cache, and remove them manually. Sure, touching the web.config clears the cache, but it’s an expensive operation that has much more overhead then just “clearing out the cache”. In this session I’ll show you and give you the code that I’ve written to build a cache viewer control panel, to take charge of the cache in your application.

While the data cache provides a nice API to interact with the items in the cache, many developers have wondered if the output cache provides a similar interface. Specifically, I've seen many questions regarding how to programmatically remove a page from the output cache. This is a question that I first stumbled upon perhaps a year ago. In attempting to investigate this problem I turned immediately to Reflector, examining the OutputCacheModule HTTP Module. Examining the source you'll find that the output cache contents are stored to a provide member variable, thereby preventing examination or removal from a page developer. Ick.

So, for the past year or so, I assumed that it was impossible to remove an item from the output cache. And then, tonight, I came across another person asking this same question and (finally) did what I should have done when first exploring this question: I Googled it. One of the first 10 hits was a page from msdn.microsoft.com - Caching Page Output with Cache Key Dependencies. In this document I read the following:

You can explicitly remove any page from the output cache by calling the HttpResponse.RemoveOutputCacheItem method. You can do this from the global.asax file, a custom ASP.NET server control you have created, or from a page, depending upon the needs of your application.

Egad, so it is possible to programmatically evict the cached output of an ASP.NET page. In fact, as the title of the article implies, it is possible to base the output cache of a particular page on a dependency in the data cache, rather than relying just on the specified duration. Color me ignorant.

This experience has taught me two things:

  1. While Reflector is oftentimes the de facto way for deducing how to bowels of the .NET Framework operate, relying on Reflector alone is a recipe for misinformation. Reflector acts as a sort of magnifying glass, allowing intense scrutiny at a specific point in the framework, but the framework is so unweildy and expansive that focusing so intently on a single aspect can make one miss the forest for the trees.
  2. The absolutely, positively, without question first thing one should do when searching for a solution for a problem is to use Google. And that should probably be the second thing done as well. As my last blog post revealed, chances are someone else has already experiences the exact same problem you have, so utilize their work/time rather than wasting your own.
Filed under:
Bug in the Hashtable Data Structure in .NET 1.x
09 April 05 07:39 PM | Scott Mitchell

Earlier today a colleague wondered if there was a built-in method in the .NET Framework to determine if a number is prime. (A prime number, you'll recall, is one whose only factors are 1 and itself.) There is such a method, but unforunately it is not very usable for two reasons:

  1. It's a private method in the System.Collections.Hashtable class, meaning you can't call it from your application code, and, more importantly,
  2. It has a bug in it that will report a certain class of non-prime numbers as prime.

As I discuss in An Extensive Examination of Data Structures: Part 2, the Hashtable class maintains a prime number of buckets. When you add an item to the Hashtable, a hashing function is used to determine what bucket to place the data into. There is a chance, however, that someone else might be using that same bucket - this is called a collision. When a collision occurs there are a variety of possible remedies; the Hashtable class uses a technique dubbed rehashing. Rehashing applies an alternative hash function to find a new bucket, and trying that one out. If that one is also taken, the algorithm repeats with yet another hash function and so on and so on until an empty bucket is found.

For performance reasons, it's important to have a healthy ratio of empty buckets to filled buckets. Consider the costs if a high percentage of buckets were filled - this would result in numerous collisions, thereby requiring that the hashtable spend an inordinate amount of time rehashing. Too low of a percentage and you have a waste of space. (For the incurably curious, 72% is the ratio Microsoft suggests, and is the default ratio. Interestingly, you cannot specify a higher ratio than this, but you can indicate that a lower ratio be used in one of the Hashtable's constructor overloads.)

Whenever you add an item to the Hashtabel that pushes the ratio between empty and non-empty buckets above the specified limit, the Hashtable increases its size. Specifically, the Hashtable is careful to resize itself to have a prime number of buckets. Why, you might ask? It has to do with the rehashing technique used. Recall that rehashing applies a new hash function each time a collision occurs in placing an item. That is, for any item being added (or searched for), it first uses hash function H1. If that leads to a collision, it attempts to resolve the collision using hash function H2. If that doesn't cut the mustard, onto H3 we go, and so on. For the Hashtable class, these hash functions are defined as:

Hk(item) = [GetHash(item) + k * (1 + (((GetHash(item) >> 5) + 1) % (hashsize – 1)))] % hashsize

With rehashing it is imperative that for a given item each hash function hashes to a unique key. This can be guaranteed if [GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] and hashsize are relatively prime - that is, if these two numbers share no common factors. One way to guarantee that any two numbers are relatively prime is to choose those one or both of two numbers as prime themselves. And so this what the Hashtable class does by ensuring hashsize - the number of buckets in the Hashtable - is a prime number.

Ok, so the Hashtable needs to be able to pick a prime number for the number of buckets when it must increase its capacity. How does it do this? Well, it starts by defining its favorite 70 prime numbers in a private integer array. These prime numbers range from the smallest possible Hashtable capacity - 11 - all they way up to 7,199,369. They are not the first 70 prime numbers, mind you, just 70 prime numbers, each one, on average, 1.253 times larger than the last (assuming I did my math right). But what happens if we have a Hashtable with 7,199,369 buckets and need to grow it? What hashsize is chosen then?

The Hashtable class, in this case, reverts to the old school way of finding a prime number - it starts at the minimum prime number size needed, moves it to the nearest odd number, if needed, and starts adding 2, checking to see if the new number is prime. To check if a number is prime, the Hashtable class calls its private method Primep(int), which returns a Boolean indicating if the passed-in integer is prime or not. The Primep(int) method's code is shown below:

private bool Primep(int candidate)
{
if ((candidate & 1) == 0)
{
return (candidate == 2);
}
for (int num1 = 3; num1 < ((int) Math.Sqrt((double) candidate)); num1 += 2)
{
if ((candidate % num1) == 0)
{
return false;
}
}
return true;
}

The method starts by asking, “Is candidate even?” If so, candidate is only prime if it equals 2. If candidate is odd, the code loops through the odd numbers between 3 and one less than the square root of the number being tested, checking to see if any of these numbers divide candidate. If any of them do, then candidate is composite (not prime); otherwise, candidate is prime.

Wondering why we don't need to test all odd numbers between 3 and candidate, why we can stop at the square root? The reason you can stop at the sqrt is because if there does not exist a factor less than or equal to the sqrt of a number, then clearly there can't be a factor greater than the sqrt since you would need a smaller number to multiply it by to reach the number being tested. More formally, assume that a number x doesn't have any factors less than or equal to sqrt(x), but that it does have a factor > sqrt(x). Let this factor be y. Ergo, there must exist some number z s.t. z*y = x, but z must be greater than sqrt(x) as well (since we already know there's no factor <= sqrt(x). So z*y > sqrt(x) * sqrt(x), or z*y > x, so z*y cannot equal x. Hence we have reached a contradiction.

If you look closely at the code used by the Hashtable class, though, you'll find an off-by-one bug. Note that the loop only goes from 3 to a number less than the square root of candidate - it should be less than or equals. Why? Well, consider that candidate is the square of two primes, such as 25 or 49. Clearly these numbers are composite, but their only factor other than 1 and the number itself is precisely the number's square root - which will never get tested by the Primep() method! Hence, Primep() indicates that numbers like 25 and 49 are indeed prime. Oops.

Fortunately the Hashtable uses those pre-determined prime numbers for Hashtables with a size up to 7,199,369, but what if we have a Hashtable that exceeds this capacity? The Hashtable is going to try to find a prime number larger than twice the current capacity. If, by misfortune, that happens to come out to a number that happens to be the square of two primes... well, your Hashtable's capacity will become a non-prime number, which will potentially break the rehashing technique. That is, when rehashing you might end up back at the same bucket that you already tried. I imagine you might even end up in a loop, tirelessly hopping around filled buckets, never making progress. (The Hashtable is smart enough that if it loops around the number of buckets times, an exception is thrown...)

After I stumbled upon this bug in the .NET Framework I did a bit of Googling to see if what I had found had been discovered previously. And, sure enough, it has, as mentioned in this blog entry by Brad Abrams: Primes in the BCL... Part 2. Brad assures us that this will be fixed in 2.0.

Speaking of 2.0, with Whidbey a new Dictionary data structure is introduced, which uses chaining to handle conflicts rather than rehasing. With chaining, each bucket has a list of conflicts. When a conflict occurs, the item is added to this list (as opposed to probing for a new bucket as with rehasing). More info at this blog entry by Krzysztof Cwalina, or you could just read An Extensive Examination of Data Structures Using C# 2.0: Part 2.

Filed under:
Getting Closer...
05 April 05 01:37 PM | Scott Mitchell
Well, March 31st came and went, and no 2.0 Beta 2. Bummer. But to keep us engaged, Microsoft recently released the ASP.NET 2.0 Beta 2 QuickStarts (by way of Darren Neimke's blog).
Filed under:
Visual Studio Hacks Available
04 April 05 08:23 PM | Scott Mitchell

Today I received my complementary copy of James Avery's Visual Studio Hacks book, of which I contributed a handful of chapters. I assume that this means that the book is now available, although Amazon.com lists its availability as June 30, 2005, which is odd since it lets me add it to my shopping cart....

Anywho, if you use Visual Studio in any sort of manner, I'm confident you'll find a number of tips, hints, and gems in this book for improving your productivity and enjoyability in the IDE. So go buy a copy already. Or you can check out the book's companion site first, before buying: http://www.visualstudiohacks.com/

More Posts

Archives

My Books

  • Teach Yourself ASP.NET 4 in 24 Hours
  • Teach Yourself ASP.NET 3.5 in 24 Hours
  • Teach Yourself ASP.NET 2.0 in 24 Hours
  • ASP.NET Data Web Controls Kick Start
  • ASP.NET: Tips, Tutorials, and Code
  • Designing Active Server Pages
  • Teach Yourself Active Server Pages 3.0 in 21 Days

I am a Microsoft MVP for ASP.NET.

I am an ASPInsider.