Email lists and duplicates

Just came across a really nasty problem which we have all faced from time to time.  Well the past couple of days it was my turn.  And email list of ~16,000 but each row having slightly different content, except the email address.  The names could be slightly different e.g. concatenated in 1 field, not there, or in the 2 correct fields (people stop have a “Name” field with first and surname  in it!).

The list had about 1,500 email addresses that appeared more than once in the 16,000.  So how to remove the all the duplicates but leave 1 row in?  As the number of duplicates was relatively low the loss of data wasn’t too much of a concern with more than 80% of the 1,500 not having more differences than the title field and the data updated.  So what to do!  SQL Server to the rescue!

/****** Script for SelectTopNRows command from SSMS  ******/
with cte as (
select row_number() over (partition by EmailAddress order by EmailAddress) as [rn],
*
from [table]
)
DELETE FROM cte WHERE rn > 1

This worked. And what’s better. Need to find out more about what I can do with partition. But so far it does enough!

2 thoughts on “Email lists and duplicates”

  1. Hey Jim. Yeah the list was already in the database. SQL Server makes it easy to import Excel spreadsheets. The CTE above is a SQL Server bit of awesome code!

Leave a Reply

Your email address will not be published.