In this article, you will learn how to remove duplicate rows in SQL Server using CTE. Database contains duplicate rows which can be filtered out using certain columns. This can happen for many reasons. When we want to show this data in our application, we need to manage these kinds of duplicated rows. So here, I am showing a simple way to remove duplicate rows.
What is CTE?
CTE stands for Common Table Expressions.
We define CTEs by adding a "WITH" clause directly before our SELECT, INSERT, UPDATE, DELETE, or MERGE statement. The WITH clause can include one or more CTEs, as shown in the following syntax.
- [WITH <common_table_expression> [,...]]
-
- <common_table_expression>::=
- cte_name [(column_name [,...])]
- AS (cte_query)
This can be represented like this…
So now, we have come to our main point, i.e., how to remove duplicated rows. Suppose, we have a table called ChecklistVersion which holds ChecklistID and ChecklistVersionID. The table may contain records like this.
ChecklistId | ChecklistVersionID |
1 | 11 |
2 | 12 |
3 | 13 |
1 | 11 |
1 | 11 |
3 | 13 |
Now, we want to select only one distinct pair. Let's see how we can do that.
Take a look at the SQL code below.
- With DataCte as
- (select *, RANK( )
- over(partition By ChecklistId, ChecklistVersionId order by ChecklistId) as rnk from ChecklitVersion )
- select * from DataCte
In the above SQL, DataCte is the CTE expression which acts as a temporary View. In the query definition, we are using RANK function and partitioning the table rows with ChecklistID and ChecklistVersionID to assign an occurrence number to each pair. The query will return a result as below.
ChecklistID | ChecklistVersionID | rnk |
1 | 11 | 1 |
1 | 11 | 2 |
1 | 11 | 3 |
2 | 12 | 1 |
3 | 13 | 1 |
3 | 13 | 2 |
Now, instead of the final SELECT query, we can delete the rows from our temporary result set which has rnk > 1.
- With DataCte as
- (select *, RANK( )
- over(partition By ChecklistId, ChecklistVersionId order by ChecklistId) as rnk from ChecklitVersion )
- select * from DataCte where rnk =1
Result of the above query.
ChecklistID | ChecklistVersionID | rnk |
1 | 11 | 1 |
2 | 12 | 1 |
3 | 13 | 1 |
This will just select only one occurrence of each ChecklistID and ChecklistVersionID pair from the ChecklistVersion table.