Thursday, March 29, 2012

Finding duplicates - what on earth am I overlooking?

I have two lists of contacts. They're similar. I want a list of all the contacts whose email address occurs only in the first list.

SELECT COUNT(DISTINCT EMAIL) FROM List1
returns 13460

SELECT COUNT(DISTINCT EMAIL) FROM List2
returns 13220

SELECT EMAIL FROM List1 WHERE EMAIL NOT IN (SELECT DISTINCT EMAIL FROM List2)
returns 0 rows

How can it be returning no rows? What am I failing to take into consideration?

:confused:selecting from lis2 where not in ( list 1 )|||Looks ok to me. This will be more efficient so give it a whirl and see if it sorts out your problem anyway:
SELECT EMAIL
FROM List1
WHERE NOT EXISTS(SELECT * FROM List2 WHERE List2.Email = List1.Email)|||Yup, thanks :) I added a bit to take the nulls out, but this seems to work. At least I get rows returned. I'm still at a loss as to why my first effort didn't work though... :confused:

SELECT EMAIL FROM List1
WHERE NOT EXISTS(SELECT * FROM List2 WHERE List2.email = List1.EMAIL AND List2.email IS NOT NULL)
AND List1.EMAIL IS NOT NULL|||You shouldn't need to explicitly remove NULLS... unless you have turned off ANSI_NULLS.|||I've no idea what state my ansi_nulls are in, but both tables have records where the EMAIL field is null, and if I run the query without the nulls-removing bits, just over half of the 13,000-or-so email addresses returned are NULL. I'm just anticipating the account manager coming back to me with "Who's this NULL bloke and why is he spamming our contacts database?"|||Sorry yes - not thinking through. And good looking blocking that NULL fella ;)

No comments:

Post a Comment