Just came across a really nasty problem which we have all faced from time to time. Well the past couple of days it was my turn. And email list of ~16,000 but each row having slightly different content, except the email address. The names could be slightly different e.g. concatenated in 1 field, not there, or in the 2 correct fields (people stop have a “Name” field with first and surname in it!).
The list had about 1,500 email addresses that appeared more than once in the 16,000. So how to remove the all the duplicates but leave 1 row in? As the number of duplicates was relatively low the loss of data wasn’t too much of a concern with more than 80% of the 1,500 not having more differences than the title field and the data updated. So what to do! SQL Server to the rescue!
/****** Script for SelectTopNRows command from SSMS ******/ with cte as ( select row_number() over (partition by EmailAddress order by EmailAddress) as [rn], * from [table] ) DELETE FROM cte WHERE rn > 1
This worked. And what’s better. Need to find out more about what I can do with partition. But so far it does enough!