Wednesday, March 28, 2012

Finding Duplicates

I have a pretty extensive database system I am querying, though what I am
looking for should be easy to find. I am looking for instances in the
Clients table that appear more than once (duplicate clients). I am using the
following:
Select First_Name + ' ' + Last_Name as Client,
Count(SSNum) as Count
From Clients
Group By First_Name, Last_Name, SSNum
Having Count(SSNum)>=2
Where I am confused is this; if I remove the Group By SSNum I have 126
instances. With the Group By SSNum I have 56. When I link in the account
table to get the site name and worker name I have 14 instances. Is there
another more reliable way to tell what names and ssn's happen more than
once?
TIA"John C Harris" <harris1113@.tampabay.rr.com> wrote in message
news:4416EDF4-0119-45CC-87A9-D23E2EEFB63B@.microsoft.com...
>I have a pretty extensive database system I am querying, though what I am
>looking for should be easy to find. I am looking for instances in the
>Clients table that appear more than once (duplicate clients). I am using
>the following:
> Select First_Name + ' ' + Last_Name as Client,
> Count(SSNum) as Count
> From Clients
> Group By First_Name, Last_Name, SSNum
> Having Count(SSNum)>=2
> Where I am confused is this; if I remove the Group By SSNum I have 126
> instances. With the Group By SSNum I have 56. When I link in the account
> table to get the site name and worker name I have 14 instances. Is there
> another more reliable way to tell what names and ssn's happen more than
> once?
> TIA
>
If you include SSNum in the GROUP BY then you will get one row for every
unique (First_Name, Last_Name, SSNum) that is duplicated.
If you don't include SSNum in the GROUP BY then you will get one row for
every unique (First_Name, Last_Name) that is duplicated.
Both methods are perfectly reliable but they tell you different things. It
all depends on what answer you want.
I guess the problem with the JOIN version is that you are doing an INNER
JOIN that is eliminating some rows. Difficult to say without seeing the
code.
Please post DDL, sample data and required results if you need more help.
--
David Portas|||Ah OK. I think I have it. Does this make sense:
I run the code below and get 56 names and ssns duplicated
I add the Worker ID and get 14 names
I add the Site Name and also get 14 names
So this is basically telling me that I have 56 clients in the table that
match more than once on name and SSN, but by adding the site and worker that
goes down to 14, meaning that I have 14 duplicates with the same name AND
same site/worker. So from that I can assume of the 56 names, 42 of them are
in more than one site, but only once in those sites?
That actually makes sense in looking through the raw data, because it
appears the same client went to more than one site.
"John C Harris" <harris1113@.tampabay.rr.com> wrote in message
news:4416EDF4-0119-45CC-87A9-D23E2EEFB63B@.microsoft.com...
>I have a pretty extensive database system I am querying, though what I am
>looking for should be easy to find. I am looking for instances in the
>Clients table that appear more than once (duplicate clients). I am using
>the following:
> Select First_Name + ' ' + Last_Name as Client,
> Count(SSNum) as Count
> From Clients
> Group By First_Name, Last_Name, SSNum
> Having Count(SSNum)>=2
> Where I am confused is this; if I remove the Group By SSNum I have 126
> instances. With the Group By SSNum I have 56. When I link in the account
> table to get the site name and worker name I have 14 instances. Is there
> another more reliable way to tell what names and ssn's happen more than
> once?
> TIA
>sql

No comments:

Post a Comment