How to delete duplicate rows from a table using the ROW_NUMBER function in SQL?
ROW_NUMBER()
is one of the SQL’s most useful functions which can be leveraged for many advanced scenarios and use cases. Some of those examples include finding a precise subset of rows, creating pagination in applications, finding top-N and bottom-N rows, ranking items, and many other types of sorting related queries.
One of the examples where I find this function very useful is identifying duplicate rows, which is a critical stage of data cleansing. During this process, we want to detect duplicate data which may be irrelevant to our data set, becuase they can cause invalid and messy reports and lead us to wrong calculations or bussines decisions. By using the ROW_NUMBER()
function, duplicates can be easily identifed and removed from the database.
Let me show you how that can be done with the ROW_NUMBER()
function.
We will start with a simple table ( called USER_INFO) that has 5 different columns which contain user info like name and address. The data in the table is not “clean” because there are some duplicate rows containing the same info in each column.
In this example, the 2nd and 3rd row are the same as well as the 5th and 6th row, and our goal is to identify and remove them from the table by using the ROW_NUMBER()
function. Our…