Scott on Writing

Musings on technical writing...

Deleting All Records In a Table EXCEPT For the N Most Recently Added Records

I recently ran into a situation where I needed to delete all records from a table except for the 1,000 most recently added records. Specifically, I was working on a site that used Error Logging Modules And Handlers (ELMAH), a free, open source error logging library for ASP.NET applications. ELMAH doesn't automatically prune its error log. If you are using the database as a log source and if your web hosting provider has a database quota in effect, it is possible to have ELMAH's error log grow so large that the quota is surpassed and the database is taken offline. To help mitigate this problem, it is a good idea to take steps to keep ELAMH's error log size in check:

As with any sort of logging service is is important that ELMAH's log be periodically pruned. If you let ELMAH's log grow unchecked it can reduce performance when querying the log and suck up disk space, which is especially important in hosted environments where there are typically hard disk quote limits for each user on the database server. The good news is that there are a number of techniques you can employ to help ensure that your ELMAH error log stays a reasonable size.

  • Use error filtering.
  • Setup a weekly job in SQL Server to delete entries older than, say, three months.
  • Update the ELMAH_LogError stored procedure to delete old log entries.

Keep in mind that trimming the error log brings with it a tradeoff: you are removing error log entries that might be important for analysis later in time. If this is the case, if you think you might need to review that error log from more than 90 days in the past, then before deleting records from the error log you should archive them somewhere.

While these approaches are certainly good measures to implement, they don't handle the case where there's a flood of errors in a short amount of time. If ELMAH's error log is being inundated with errors from the past 24 hours, a script that deletes error log entries older than three months is going to have no effect.

An alternative approach is to delete all entries in ELMAH other than the most recently added N error log entries. This ensures that the error log never exceeds a maximum number of records (N), while keeping the N most recent records. The following query will do the trick:

DELETE FROM ELMAH_Error
WHERE ErrorId NOT IN
        (SELECT TOP 1000 ErrorId
         FROM ELAMH_Error
         ORDER BY TimeUtc DESC)

The ErrorId column is the primary key in the ELMAH_Error table, and TimeUtc stores the UTC date/time the error was recorded. The subquery gets the first 1,000 ErrorId values ordered by the time the error was logged, from the most recently logged to the oldest. The query deletes all error records whose ErrorId values are not in that set of the 1,000 most recently added errors.

If you are using SQL Server 2005 or beyond, you can use the ROW_NUMBER() keyword in a query like so:

DELETE FROM ELMAH_Error
WHERE ErrorId IN (
      SELECT ErrorId
      FROM
        (
           SELECT ErrorId, ROW_NUMBER() OVER(ORDER BY TimeUtc DESC) AS RowIndex
           FROM ELMAH_Error
        ) AS ErrorsWithRowNumbers
      WHERE RowIndex > 1000
)

The innermost query uses the ROW_NUMBER() keyword to assign a row number to each record from ELMAH_Error, numbering the most recently logged error 1, the next one 2, and so forth, such that the oldest error will have the largest row number assigned. This innermost query is used as a derived table. The query that uses it gets the ErrorId values whose row number is greater than 1,000, which returns those ErrorId values for errors that are “ranked” 1,001 or higher. In other words, it returns those ErrorId values for errors that are not one of the 1,000 most recently added errors. Finally, this set of ErrorIds are deleted from the table.

I ended up using the second query (the one with ROW_NUMBER()), placing it in the ELMAH_LogError stored procedure (along with a DELETE statement that removes any error log entries older than 3 weeks). If you use either one of these queries, consider adding an index on the TimeUtc column sorted in descending order.

Happy Programming!

posted on Thursday, October 01, 2009 8:50 AM

Feedback

# Dew Drop – October 2, 2009 | Alvin Ashcraft's Morning Dew 10/2/2009 4:47 AM Pingback/TrackBack

Dew Drop – October 2, 2009 | Alvin Ashcraft's Morning Dew

# 
Twitter Trackbacks for

Scott on Writing
[scottonwriting.net]
on Topsy.com
10/2/2009 7:46 AM Pingback/TrackBack


Twitter Trackbacks for

Scott on Writing
[scottonwriting.net]
on Topsy.com

# Daily tech links for .net and related technologies - October 2-4, 2009 10/3/2009 12:07 AM Sanjeev Agarwal

Daily tech links for .net and related technologies - October 2-4, 2009 Web Development Special Folders

# Deleting All Records In a Table EXCEPT For the N Most Recently Added Records 10/3/2009 9:00 AM PimpThisBlog.com

Thank you for submitting this cool story - Trackback from PimpThisBlog.com

# Weekly Link Post 113 « Rhonda Tipton’s WebLog 10/4/2009 5:43 PM Pingback/TrackBack

Weekly Link Post 113 « Rhonda Tipton’s WebLog

# Deleting All Records In a Table EXCEPT For the N Most Recently Added Records « Jasper Blog 10/12/2009 12:26 AM Pingback/TrackBack

Deleting All Records In a Table EXCEPT For the N Most Recently Added Records « Jasper Blog

Title:  
Name:  
Url:
Protected by Clearscreen.SharpHIPEnter the code you see:
Comments   

My Links

Ads Via DevMavens

Archives

Post Categories

 

I am a Microsoft MVP for ASP.NET.
I am an ASPInsider.
<March 2010>
SMTWTFS
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Comment Stats

DayTotal% of Total
Sunday 2056.8%
Monday 42514.1%
Tuesday 51917.2%
Wednesday 55518.4%
Thursday 58019.2%
Friday 54718.1%
Saturday 1886.2%
Total 3019100.0%

Hour1Total% of Total
12:00 AM 782.6%
1:00 AM 812.7%
2:00 AM 682.3%
3:00 AM 822.7%
4:00 AM 692.3%
5:00 AM 1264.2%
6:00 AM 1183.9%
7:00 AM 1816.0%
8:00 AM 1926.4%
9:00 AM 1585.2%
10:00 AM 1886.2%
11:00 AM 1936.4%
12:00 PM 2016.7%
1:00 PM 1846.1%
2:00 PM 1695.6%
3:00 PM 1354.5%
4:00 PM 1153.8%
5:00 PM 1073.5%
6:00 PM 1013.3%
7:00 PM 1073.5%
8:00 PM 923.0%
9:00 PM 882.9%
10:00 PM 913.0%
11:00 PM 953.1%
Total 3019100.0%

Comments by Blog Entry Date/Time

Day Entry MadeAvg.Total
Sunday 4.97159
Monday 4.80384
Tuesday 4.04477
Wednesday 7.39680
Thursday 6.26676
Friday 5.07466
Saturday 4.78177
Total 5.403019

Hour1 Entry MadeAvg.Total
12:00 AM 5.2937
1:00 AM 1.002
5:00 AM 0.000
7:00 AM 3.8550
8:00 AM 3.72134
9:00 AM 6.06297
10:00 AM 5.63276
11:00 AM 4.22194
12:00 PM 6.16351
1:00 PM 3.09133
2:00 PM 4.89230
3:00 PM 7.64321
4:00 PM 4.00108
5:00 PM 6.07170
6:00 PM 4.64116
7:00 PM 8.95188
8:00 PM 8.63164
9:00 PM 5.00115
10:00 PM 6.31101
11:00 PM 4.5732
Total 5.403019

Learn More About Comment Stats
1 - All times GMT -8...


Blog Stats

Favorite Web Sites

My Books

My MSDN Articles