January 2011 - Posts

Removing Gaps and Duplicates from a Numeric Column in Microsoft SQL Server
18 January 11 08:35 PM | Scott Mitchell | 1 comment(s)

Here’s the scenario: you have a database table with an integral numeric column used for sort order of some other non-identifying purpose. Let’s call this column SortOrder. There are a many rows in this table. Every row should have a unique, sequentially increasing value in its SortOrder column, but this may not be the case – there may be gaps and/or duplicate values in this column.

For example, consider a table with the following schema and data:

EmployeeID

Name

SortOrder

1 Scott 1
2 Jisun 8
3 Alice 7
4 Sam 7
5 Benjamin 3
6 Aaron 9
7 Alexis 4
8 Barney 5
9 Jim 5

Note how the SortOrder column has some gaps and duplicates. Ideally, the SortOrder column values for these nine rows would be 1, 2, 3, …, 9, but this isn’t the case. Instead, the current values (in ascending order) are: 1, 3, 4, 5, 5, 7, 7, 8, 9.

Our task is to take the existing SortColumn values and get them into the ideal format. That is, after our modifications, the table’s data should look like so:

EmployeeID

Name

SortOrder

1 Scott 1
2 Jisun 8
3 Alice 6
4 Sam 7
5 Benjamin 2
6 Aaron 9
7 Alexis 3
8 Barney 4
9 Jim 5

Note how now there are now no gaps or duplicates in SortOrder.

The Solution: Ranking Functions, Multi-Table UPDATE Statements and Common Table Expressions (CTEs)

Microsoft SQL Server 2005 added a number of ranking functions that simplify assigning ranks to query results, such as associating a sequentially increasing row number with each record returned from a query or assigning a rank to each result. For example, the following query – which uses SQL Server’s ROW_NUMBER() function – returns the records from the Employees table with a sequentially increasing number associated with each record:

SELECT Name, 
       SortOrder, 
       ROW_NUMBER() OVER (ORDER BY SortOrder) AS NoGapsNoDupsSortOrder
FROM Employees

The above query would return the following results. Note how the data is sorted by SortOrder. There’s also a new, materialized column (NoGapsNoDupsSortOrder) that returns sequentially increasing values.

Name

SortOrder

NoGapsNoDupsSortOrder

Scott 1 1
Benjamin 3 2
Alexis 4 3
Barney 5 4
Jim 5 5
Alice 7 6
Sam 7 7
Jisun 8 8
Aaron 9 9

What we need to do now is take the value in NoGapsNoDupsSortOrder and assign it to the SortOrder column. If we had the above results in a separate table we could perform such an UPDATE, as SQL Server makes it possible to update records in one database table with data from another table. (See HOWTO: Update Records in a Database Table With Data From Another Table.)

While the results in the above grid are not in a table (but are rather the results from a query), the good news is that we can treat those results as if they were results in another table using a Common Table Expression (CTE). CTEs, which were introduced in SQL Server 2005, can be thought of as a one-off view; that is, a view that is created, defined, and used in a single SQL statement.

Putting it all together, we end up with the following UPDATE statement:

WITH OrderedResults(EmployeeId, NoGapNoDupSortOrder) AS 
(
    SELECT EmployeeId, 
              ROW_NUMBER() OVER (ORDER BY SortOrder) AS NoGapNoDupSortOrder
    FROM Employees
)
UPDATE Employees
    SET SortOrder = OrderedResults.NoGapNoDupSortOrder
FROM OrderedResults
WHERE Employees.EmployeeId = OrderedResults.EmployeeId AND 
       Employees.SortOrder <> OrderedResults.NoGapNoDupSortOrder

The above query starts by defining a CTE named OrderedResults that returns two column values: EmployeeId and NoGapNoDupSortOrder. It then updates the Employees table, setting its SortOrder column value to the NoGapNoDupSortOrder value where the Employees table’s EmployeeId value matches the OrderedResults CTEs EmployeeId value (and where the SortOrder does not equal the NoGapNoDupSortOrder).

For more information on CTEs, ranked results, and updating one table (Employees) with data from another table or CTE (OrderedResults), check out the following resources:

Happy Programming!

Filed under:
Customizing ELMAH’s Error Emails
06 January 11 10:40 PM | Scott Mitchell | 3 comment(s)

ELMAH (Error Logging Modules and Handlers) is my ASP.NET logging facility of choice. It can be added to a new or running ASP.NET site in less than a minute. It’s open source and it’s creator, Atif Aziz, remains actively involved with the project and can be found answering questions about ELMAH, from Stackoverflow to ELMAH’s Google Discussion group. What’s not to love about it?

ELMAH’s sole purpose is to log and notify developers of errors that occur in an ASP.NET application. Error details can be logged to any number of log sources – SQL Server, MySQL, XML, Oracle, and so forth. Likewise, when an error occurs ELMAH can notify developers by sending the error details to one or more email addresses.

The notification email message sent by ELMAH is pretty straightforward: it contains the exception type and message, the date and time the exception was generated, the stack trace, a table of all server variables, and the Yellow Screen of Death that was generated by the error.

image

Prior to sending the notification email, ELMAH’s ErrorMailModule class raises its Mailing event. If you create an event handler for this event you can inspect details about the error that just occurred and modify the email message that is about to be sent. In this way you can customize the notification email, perhaps setting the priority based on the error or cc’ing certain email addresses if the error has originated from a particular page on the website.

To create an event handler for the Mailing event, open (or create) the Global.asax file and add the following syntax:

void ErrorMailModuleName_Mailing(object sender, Elmah.ErrorMailEventArgs e)
{
    ...
}

In the above code snippet, replace ErrorMailModuleName with the name you assigned the ErrorMailModule HTTP Module. This module may be defined in one or two places: the <system.web>/<httpModule> section and/or the <system.webServer>/<modules> section. The following Web.config snippet shows both sections:

<system.web>
    <httpModules>
        <add name="ErrorMail" type="Elmah.ErrorMailModule, Elmah" />
        ...
    </httpModules>

    ...    
</system.web>

<system.webServer>
    <modules>
        <add name="ErrorMail" type="Elmah.ErrorMailModule, Elmah" preCondition="managedHandler" />
        ...
    </modules>
    
    ...
</system.webServer>

The name for the module in both sections should be the same - in the above snippet the name is ErrorMail. Consequently, to create an event handler for ELMAH’s Mailing event with the above configuration we would use the syntax:

void ErrorMail_Mailing(object sender, Elmah.ErrorMailEventArgs e)
{
    ...
}

Note that the Mailing event handler’s second input parameter is of type ErrorMailEventArgs. This class provides two helpful properties:

  • Error – the error that was just logged by ELMAH. This property is of type Elmah.Error and has properties like Exception, Message, User, Time, and so on.
  • Mail – the MailMessage object that is about to be sent. This gives you an opportunity to modify the outgoing email.

The following Mailing event handler shows how you could adjust the notification email based on the type of exception that occurred. Here, if the error that just occurred was an ApplicationException then the notification email is set to a High priority, it’s subject it changed to “This is a high priority item!”, and mitchell@4guysfromrolla.com is cc’d.

void ErrorMail_Mailing(object sender, Elmah.ErrorMailEventArgs e)
{
    if (e.Error.Exception is ApplicationException ||
        (e.Error.Exception is HttpUnhandledException && 
            e.Error.Exception.InnerException != null &&
            e.Error.Exception.InnerException is ApplicationException))
    {
        e.Mail.Priority = System.Net.Mail.MailPriority.High;
        e.Mail.Subject = "This is a high priority item!";
        e.Mail.CC.Add("mitchell@4guysfromrolla.com");
    }
}

Note that if an unhandled ApplicationException is what prompted ELMAH to record the error then by this point the original exception will have been wrapped in an HttpUnhandledException. So in the if statement above I check to see if the error’s Exception property is an ApplicationException or if it is an HttpUnhandledException exception with an InnerException that is an ApplicationException. If either of those conditions hold then I want to customize the notification email.

Happy Programming!

Filed under:
jQuery Usage Among Top Sites
06 January 11 12:11 AM | Scott Mitchell | with no comments

If you use jQuery on your website two things to consider are:

  1. What version of jQuery to use, and
  2. How should the jQuery library be referenced from your website

Concerning the first question… Ideally everyone would use the latest and greatest version of jQuery. With each new version, the guys and gals building jQuery fix bugs, add new and useful features, and improve the library’s performance. But with any updated product there are potentially breaking changes with each new release, so upgrading carries with it some friction in the form of regression testing your script (and any plug-ins you are using). So then the question really becomes, when does the benefits of the new version outweigh the cost of upgrading – and that is a question you’ll have to answer for yourself.

Concerning the second question… Rather than hosting the jQuery library locally, public facing websites should use a Content Delivery Network (CDN). In his blog post, 3 reasons why you should let Google host jQuery for you, Dave Ward provides an excellent summary of what a CDN is and why you should use one:

A CDN — short for Content Delivery Network — distributes your static content across servers in various, diverse physical locations. When a user’s browser resolves the URL for these files, their download will automatically target the closest available server in the network. …

Potentially the greatest (yet least mentioned) benefit of using … a CDN is that your users may not need to download jQuery at all.

No matter how aggressive your caching, if you’re hosting jQuery locally then your users must download it at least once. A user may very well have dozens of identical copies of jQuery-1.3.2.min.js in their browser’s cache, but those duplicate files will be ignored when they visit your site for the first time.

On the other hand, when a browser sees multiple subsequent requests for the same … hosted version of jQuery, it understands that these requests are for the same file. … This means that even if someone visits hundreds of sites using the same hosted version of jQuery, they will only have to download it once.

(As an aside, a common concern I hear from clients when I suggest using a CDN is the fear that if the CDN goes offline then their site will break. The good news is that you can use a CDN as your primary source for jQuery and provide a local, fall-back version should the CDN be down. This Stackoverflow question - Best way to use Google's hosted jQuery, but fall back to my hosted library on Google fail – shows a couple of ways to accomplish this.)

Dave Ward’s jQuery CDN Survey

In September 2010, Dave Ward ran an interesting experiment – he wrote some software that crawled the 200,000 most popular websites (as reported by Alexa) and examined how they referenced the jQuery library (if at all) and to answer questions like:

  • Did they use a CDN?
  • If so, which CDN?
  • And so on.

In 6,953 reasons why I still let Google host jQuery for me Dave shares his results, which I’ve summarized here:

  • Only one top 1,000 ranked Alexa site uses the Microsoft jQuery CDN (Microsoft.com)
  • 47 of the top 1,000 ranked Alexa sites use the Google CDN
  • 6,953 of the top 200,000 sites use the Google CDN

The lesson to take away from Dave’s study is that you should use the Google CDN to host the jQuery library because using that CDN gives you the greatest likelihood that your visitors already have that version of the jQuery in their browser cache.

Repeating Dave’s Study

I decided to repeat Dave’s study so that I could see what interesting unreported information lie in the data. So I whipped up my own application to crawl the 13,247 top rated Alexa sites using the Html Agility Pack to grab all <script> elements with an src attribute, saving the src path if it contained the substring “jquery”. Before showing you my data, let me repeat the same considerations/warnings Dave noted with regard to the accuracy of his survey:

I’ll be the first to admit that my approach is fraught with inaccuracies:

  • Alexa – Alexa itself isn’t a great ranking mechanism. It depends on toolbar-reported data and individual rankings must be taken with a grain of salt. However, I believe that aggregate trends across its top 200,000 sites represents a useful high-level view.
  • HTTP errors – About 10% of the URLs I requested were unresolvable, unreachable, or otherwise refused my connection. A big part of that is due to Alexa basing its rankings on domains, not specific hosts. Even if a site only responds to www.domain.com, Alexa lists it as domain.com and my request to domain.com went unanswered.
  • jsapi – Sites using Google’s jsapi loader and google.load() weren’t counted in my tally, even though they do eventually reference the same googleapis.com URL. Both script loading approaches do count toward the same critical mass of caching, but my crawler’s regex doesn’t catch google.load().
  • Internal usage – It’s not uncommon for sites to pare their landing pages down to the absolute bare minimum, only introducing more superfluous JavaScript references on inner pages that require them. Since I only analyzed root documents, I undercounted any sites taking that approach and using the Google CDN to host jQuery on those inner pages.

That’s a very thorough way of saying, These results are not definitive, but are meant to give a general overview or understanding of the jQuery and CDN usage landscape.

I leave you with what I found to be some interesting statistics…

Resurrecting the Microsoft CDN Bit By Bit

For die-hard supporters of the Microsoft CDN, you’ll be happy to know that there is now more than one top 1,000 ranked site that uses the CDN! In addition to Microsoft.com (rank 24), SparkStudios.com (rank 359) and XBox.com (rank 650) now also use the Microsoft CDN. However, none of these three sites (nor www.asp.net – rank 1,226) use the suggested CDN URL:

The CDN used to use the microsoft.com domain name and has been changed to use the aspnetcdn.com domain name. This change was made to increase performance because when a browser referenced the microsoft.com domain it would send any cookies from that domain across the wire with each request. By renaming to a domain name other than microsoft.com performance can be increased by as much to 25%. Note ajax.microsoft.com will continue to function but ajax.aspnetcdn.com is recommended.

Only two sites in my survey actually use the ajax.aspnetcdn.com - wwwwwwwwwww.net (rank 7,542) and mmajunkie.com (rank 8,057). A total of 20 websites in my survey use ajax.microsoft.com.

The Ten Most Popular Websites That Host jQuery at Google’s CDN

Here are the ten most popular sites that use jQuery hosted at the Google CDN, along with which version of jQuery they use:

Alexa Rank

Domain jQuery Version
10 twitter.com 1.3.0
98 fileserve.com 1.3.2
111 taringa.net 1.4.2
117 twitpic.com 1.4.2
123 xtendmedia.com 1.4.1
145 stumbleupon.com 1.4.2
174 guardian.co.uk 1.4.2
175 stackoverflow.com 1.4.2
187 imgur.com 1.4.1
204 reference.com 1.4.2

jQuery Version Popularity

The following graph shows the popularity of different versions of jQuery. The bar height represents the total number of sites in my survey that use the particular jQuery version, whereas the red portion indicates the number that host jQuery on the Google CDN.

jQueryVersionUsage

Note that my method for determining the jQuery version was by examining the URL itself and did not actually search for the version in the actual jQuery file. Google’s CDN uses URLs that embed the version number. Microsoft’s CDN embeds the version number in the file name. For those that hosted jQuery locally (or with some other CDN), I searched both the URL and file name for a version string. The vast majority of sites that self-host jQuery did not include any version identification in the URL or filename (e.g., the file was hosted at a path like /scripts/jquery.min.js) and therefore aren’t represented in this graph. However, I think the pattern here can be extrapolated to those site’s where the version number isn’t part of the URL/file name. Namely, version 1.3.2 and 1.4.2 are the most used.

For the record, there are only two sites in my survey that use version 1.3.0 – Twitter.com (rank 10) and MagicBricks.com (rank 4,324).

jQuery Usage in Aggregate

Of the 13,247 sites surveyed, more than 35% of the sites (4,689) use jQuery…

Of these 4,689 sites…

  • Only 18% of these sites use the Google or Microsoft CDNs…
    • 22 use the Microsoft CDN
    • 826 use the Google CDN
Filed under:
More Posts

Archives

My Books

  • Teach Yourself ASP.NET 4 in 24 Hours
  • Teach Yourself ASP.NET 3.5 in 24 Hours
  • Teach Yourself ASP.NET 2.0 in 24 Hours
  • ASP.NET Data Web Controls Kick Start
  • ASP.NET: Tips, Tutorials, and Code
  • Designing Active Server Pages
  • Teach Yourself Active Server Pages 3.0 in 21 Days

I am a Microsoft MVP for ASP.NET.

I am an ASPInsider.