Removing Gaps and Duplicates from a Numeric Column in Microsoft SQL Server

Published 18 January 11 08:35 PM | Scott Mitchell

Here’s the scenario: you have a database table with an integral numeric column used for sort order of some other non-identifying purpose. Let’s call this column SortOrder. There are a many rows in this table. Every row should have a unique, sequentially increasing value in its SortOrder column, but this may not be the case – there may be gaps and/or duplicate values in this column.

For example, consider a table with the following schema and data:

EmployeeID

Name

SortOrder

1 Scott 1
2 Jisun 8
3 Alice 7
4 Sam 7
5 Benjamin 3
6 Aaron 9
7 Alexis 4
8 Barney 5
9 Jim 5

Note how the SortOrder column has some gaps and duplicates. Ideally, the SortOrder column values for these nine rows would be 1, 2, 3, …, 9, but this isn’t the case. Instead, the current values (in ascending order) are: 1, 3, 4, 5, 5, 7, 7, 8, 9.

Our task is to take the existing SortColumn values and get them into the ideal format. That is, after our modifications, the table’s data should look like so:

EmployeeID

Name

SortOrder

1 Scott 1
2 Jisun 8
3 Alice 6
4 Sam 7
5 Benjamin 2
6 Aaron 9
7 Alexis 3
8 Barney 4
9 Jim 5

Note how now there are now no gaps or duplicates in SortOrder.

The Solution: Ranking Functions, Multi-Table UPDATE Statements and Common Table Expressions (CTEs)

Microsoft SQL Server 2005 added a number of ranking functions that simplify assigning ranks to query results, such as associating a sequentially increasing row number with each record returned from a query or assigning a rank to each result. For example, the following query – which uses SQL Server’s ROW_NUMBER() function – returns the records from the Employees table with a sequentially increasing number associated with each record:

SELECT Name, 
       SortOrder, 
       ROW_NUMBER() OVER (ORDER BY SortOrder) AS NoGapsNoDupsSortOrder
FROM Employees

The above query would return the following results. Note how the data is sorted by SortOrder. There’s also a new, materialized column (NoGapsNoDupsSortOrder) that returns sequentially increasing values.

Name

SortOrder

NoGapsNoDupsSortOrder

Scott 1 1
Benjamin 3 2
Alexis 4 3
Barney 5 4
Jim 5 5
Alice 7 6
Sam 7 7
Jisun 8 8
Aaron 9 9

What we need to do now is take the value in NoGapsNoDupsSortOrder and assign it to the SortOrder column. If we had the above results in a separate table we could perform such an UPDATE, as SQL Server makes it possible to update records in one database table with data from another table. (See HOWTO: Update Records in a Database Table With Data From Another Table.)

While the results in the above grid are not in a table (but are rather the results from a query), the good news is that we can treat those results as if they were results in another table using a Common Table Expression (CTE). CTEs, which were introduced in SQL Server 2005, can be thought of as a one-off view; that is, a view that is created, defined, and used in a single SQL statement.

Putting it all together, we end up with the following UPDATE statement:

WITH OrderedResults(EmployeeId, NoGapNoDupSortOrder) AS 
(
    SELECT EmployeeId, 
              ROW_NUMBER() OVER (ORDER BY SortOrder) AS NoGapNoDupSortOrder
    FROM Employees
)
UPDATE Employees
    SET SortOrder = OrderedResults.NoGapNoDupSortOrder
FROM OrderedResults
WHERE Employees.EmployeeId = OrderedResults.EmployeeId AND 
       Employees.SortOrder <> OrderedResults.NoGapNoDupSortOrder

The above query starts by defining a CTE named OrderedResults that returns two column values: EmployeeId and NoGapNoDupSortOrder. It then updates the Employees table, setting its SortOrder column value to the NoGapNoDupSortOrder value where the Employees table’s EmployeeId value matches the OrderedResults CTEs EmployeeId value (and where the SortOrder does not equal the NoGapNoDupSortOrder).

For more information on CTEs, ranked results, and updating one table (Employees) with data from another table or CTE (OrderedResults), check out the following resources:

Happy Programming!

Filed under:

Comments

No Comments

Archives

My Books

  • Teach Yourself ASP.NET 4 in 24 Hours
  • Teach Yourself ASP.NET 3.5 in 24 Hours
  • Teach Yourself ASP.NET 2.0 in 24 Hours
  • ASP.NET Data Web Controls Kick Start
  • ASP.NET: Tips, Tutorials, and Code
  • Designing Active Server Pages
  • Teach Yourself Active Server Pages 3.0 in 21 Days

I am a Microsoft MVP for ASP.NET.

I am an ASPInsider.