I had a requirement where I wanted to delete all the duplicate email ids from my table.
My table structure is :
* EMAIL_ID
uniqueidentifier PRIMARY KEY
* EMAIL
varchar (128) Stores the email address
I write the following query to find out the duplicate records :
SELECT EMAIL,
COUNT(*) DUPLICATES
FROM EMAILS
GROUP BY EMAIL
HAVING COUNT(*) > 1
ORDER BY DUPLICATES
Here's the delete Query:
Here's the delete Query:
DELETE FROM EMAILS
WHERE EMAIL_ID
IN (
SELECT EMAIL_ID
FROM (
SELECT EMAIL, EMAIL_ID
,
RANK() OVER (PARTITION BY EMAIL ORDER BY EMAIL_ID
) AS EMAIL_RANK
FROM EMAILS
WHERE EMAIL
IN (
SELECT EMAIL
FROM EMAILS
GROUP BY EMAIL
HAVING COUNT(*) > 1
)
) A
WHERE A.EMAIL_RANK != 1
)
There were lot of other approaches which I found, most of them suggested using cursors or using Set Row_Count 1 and then running the delete query iteratively. But I wanted to do it in one shot.
It took me less than a second to delete almost 1000 duplicate rows from around 10,000 rows in my table.
I am sure there would be better approaches, surely to avoid multiple IN clauses. Just in case you have one put in the comments.
I got to figure out one for Oracle as well.