Deleting All Records In a Table EXCEPT For the N Most Recently Added Records
I recently ran into a situation where I needed to delete all records from a table except for the 1,000 most recently added records. Specifically, I was working on a site that used Error Logging Modules And Handlers (ELMAH), a free, open source error logging library for ASP.NET applications. ELMAH doesn't automatically prune its error log. If you are using the database as a log source and if your web hosting provider has a database quota in effect, it is possible to have ELMAH's error log grow so large that the quota is surpassed and the database is taken offline. To help mitigate this problem, it is a good idea to take steps to keep ELAMH's error log size in check:
As with any sort of logging service is is important that ELMAH's log be periodically pruned. If you let ELMAH's log grow unchecked it can reduce performance when querying the log and suck up disk space, which is especially important in hosted environments where there are typically hard disk quote limits for each user on the database server. The good news is that there are a number of techniques you can employ to help ensure that your ELMAH error log stays a reasonable size.
- Use error filtering.
- Setup a weekly job in SQL Server to delete entries older than, say, three months.
- Update the ELMAH_LogError stored procedure to delete old log entries.
Keep in mind that trimming the error log brings with it a tradeoff: you are removing error log entries that might be important for analysis later in time. If this is the case, if you think you might need to review that error log from more than 90 days in the past, then before deleting records from the error log you should archive them somewhere.
While these approaches are certainly good measures to implement, they don't handle the case where there's a flood of errors in a short amount of time. If ELMAH's error log is being inundated with errors from the past 24 hours, a script that deletes error log entries older than three months is going to have no effect.
An alternative approach is to delete all entries in ELMAH other than the most recently added N error log entries. This ensures that the error log never exceeds a maximum number of records (N), while keeping the N most recent records. The following query will do the trick:
DELETE FROM ELMAH_Error
WHERE ErrorId NOT IN
(SELECT TOP 1000 ErrorId
ORDER BY TimeUtc DESC)
The ErrorId column is the primary key in the ELMAH_Error table, and TimeUtc stores the UTC date/time the error was recorded. The subquery gets the first 1,000 ErrorId values ordered by the time the error was logged, from the most recently logged to the oldest. The query deletes all error records whose ErrorId values are not in that set of the 1,000 most recently added errors.
If you are using SQL Server 2005 or beyond, you can use the ROW_NUMBER() keyword in a query like so:
DELETE FROM ELMAH_Error
WHERE ErrorId IN (
SELECT ErrorId, ROW_NUMBER() OVER(ORDER BY TimeUtc DESC) AS RowIndex
) AS ErrorsWithRowNumbers
WHERE RowIndex > 1000
The innermost query uses the ROW_NUMBER() keyword to assign a row number to each record from ELMAH_Error, numbering the most recently logged error 1, the next one 2, and so forth, such that the oldest error will have the largest row number assigned. This innermost query is used as a derived table. The query that uses it gets the ErrorId values whose row number is greater than 1,000, which returns those ErrorId values for errors that are “ranked” 1,001 or higher. In other words, it returns those ErrorId values for errors that are not one of the 1,000 most recently added errors. Finally, this set of ErrorIds are deleted from the table.
I ended up using the second query (the one with ROW_NUMBER()), placing it in the ELMAH_LogError stored procedure (along with a DELETE statement that removes any error log entries older than 3 weeks). If you use either one of these queries, consider adding an index on the TimeUtc column sorted in descending order.