Fragmented array: pretty

Wednesday, March 28, 2012

Finding Duplicates

I have a pretty extensive database system I am querying, though what I am
looking for should be easy to find. I am looking for instances in the
Clients table that appear more than once (duplicate clients). I am using the
following:
Select First_Name + ' ' + Last_Name as Client,
Count(SSNum) as Count
From Clients
Group By First_Name, Last_Name, SSNum
Having Count(SSNum)>=2
Where I am confused is this; if I remove the Group By SSNum I have 126
instances. With the Group By SSNum I have 56. When I link in the account
table to get the site name and worker name I have 14 instances. Is there
another more reliable way to tell what names and ssn's happen more than
once?
TIA"John C Harris" <harris1113@.tampabay.rr.com> wrote in message
news:4416EDF4-0119-45CC-87A9-D23E2EEFB63B@.microsoft.com...
>I have a pretty extensive database system I am querying, though what I am
>looking for should be easy to find. I am looking for instances in the
>Clients table that appear more than once (duplicate clients). I am using
>the following:
> Select First_Name + ' ' + Last_Name as Client,
> Count(SSNum) as Count
> From Clients
> Group By First_Name, Last_Name, SSNum
> Having Count(SSNum)>=2
> Where I am confused is this; if I remove the Group By SSNum I have 126
> instances. With the Group By SSNum I have 56. When I link in the account
> table to get the site name and worker name I have 14 instances. Is there
> another more reliable way to tell what names and ssn's happen more than
> once?
> TIA
>
If you include SSNum in the GROUP BY then you will get one row for every
unique (First_Name, Last_Name, SSNum) that is duplicated.
If you don't include SSNum in the GROUP BY then you will get one row for
every unique (First_Name, Last_Name) that is duplicated.
Both methods are perfectly reliable but they tell you different things. It
all depends on what answer you want.
I guess the problem with the JOIN version is that you are doing an INNER
JOIN that is eliminating some rows. Difficult to say without seeing the
code.
Please post DDL, sample data and required results if you need more help.
--
David Portas|||Ah OK. I think I have it. Does this make sense:
I run the code below and get 56 names and ssns duplicated
I add the Worker ID and get 14 names
I add the Site Name and also get 14 names
So this is basically telling me that I have 56 clients in the table that
match more than once on name and SSN, but by adding the site and worker that
goes down to 14, meaning that I have 14 duplicates with the same name AND
same site/worker. So from that I can assume of the 56 names, 42 of them are
in more than one site, but only once in those sites?
That actually makes sense in looking through the raw data, because it
appears the same client went to more than one site.
"John C Harris" <harris1113@.tampabay.rr.com> wrote in message
news:4416EDF4-0119-45CC-87A9-D23E2EEFB63B@.microsoft.com...
>I have a pretty extensive database system I am querying, though what I am
>looking for should be easy to find. I am looking for instances in the
>Clients table that appear more than once (duplicate clients). I am using
>the following:
> Select First_Name + ' ' + Last_Name as Client,
> Count(SSNum) as Count
> From Clients
> Group By First_Name, Last_Name, SSNum
> Having Count(SSNum)>=2
> Where I am confused is this; if I remove the Group By SSNum I have 126
> instances. With the Group By SSNum I have 56. When I link in the account
> table to get the site name and worker name I have 14 instances. Is there
> another more reliable way to tell what names and ssn's happen more than
> once?
> TIA
>sql

Wednesday, March 21, 2012

Find the nonexisting Event

Hello everyone I am pretty new with T-SQL and SQL Server 2000 and I need some help or some suggestions on a specific problem. Here goes...

I have 2 tables that Iam using

EVENTS Event_Codes

PersonIdNo EventCodeIdNo EventCodeIdNo EvenCode

6349 13 13 Criminal History

6349 31 31 DL

6349 30 30 CDL

6345 75 75 EMPLOYMENT REF

6345 13 74 Texas Lic

2 CDA

4 Emp Suggestions

I need to find the EventCodeIdNo's that are missing for each PERSON in the Events Table. I've tried creating Arrays and looping through them but I haven't had much success with that.

Thank you,

-DM

Code Snippet

There might be better solutions, if the number of rows are small in events then you could use something like this.

select distinct PersonIdNo, ce.EventCodeIdNo, ce.EvenCode from
(select
PersonIdNo, ec.EventCodeIdNo, ec.EvenCode
from events e
cross join Event_Codes ec) ce
where not exists
(select * from events where events.PersonIdNo = ce.PersonIdNo
and events.EventCodeIdNo = ce.EventCodeIdNo)

|||

Well, assuming you also have a person table, you take a cross join of the persons and events, then remove the ones where there is a match.

drop table eventAttendee, event, person
go
create table person --added because a person might not have attended an event at all
(
personId int primary key
)
create table event
(
eventCode int primary key
)
create table eventAttendee
(
personId int references person(personId),
eventCode int references event(eventCode),
primary key (personId, eventCode)
)
insert into person
select 6349
union all
select 6345
insert into event
select 13
union all
select 31
union all
select 30
union all
select 75
union all
select 74
union all
select 2
union all
select 4

insert into eventAttendee
select 6349, 13
union all
select 6349, 31
union all
select 6349, 30
union all
select 6345, 75
union all
select 6345, 13

select *
from event
cross join person
where not exists ( select *
from eventAttendee
where event.eventCode = eventAttendee.eventCode
and person.personId = eventAttendee.personId)

|||

Louis has provided a nice elegant solution.

I offer this suggestion as an alternative in case you don't wish to involve the Person Table in the query. Depending upon indexing, this should be relatively quick.

SET NOCOUNT ON

DECLARE @.Events table
( RowID int IDENTITY,
PersonID int,
EventCodeID int
)

INSERT INTO @.Events VALUES ( 6349, 13 )
INSERT INTO @.Events VALUES ( 6349, 31 )
INSERT INTO @.Events VALUES ( 6349, 30 )
INSERT INTO @.Events VALUES ( 6345, 75 )
INSERT INTO @.Events VALUES ( 6345, 13 )

DECLARE @.EventCodes table
( RowID int IDENTITY,
EventCodeID int,
EventCode varchar(25)
)

INSERT INTO @.EventCodes VALUES ( 13, 'Criminal History' )
INSERT INTO @.EventCodes VALUES ( 31, 'DL' )
INSERT INTO @.EventCodes VALUES ( 30, 'CDL' )
INSERT INTO @.EventCodes VALUES ( 75, 'EMPLOYMENT REF' )
INSERT INTO @.EventCodes VALUES ( 74, 'Texas Lic' )
INSERT INTO @.EventCodes VALUES ( 2, 'CDA' )
INSERT INTO @.EventCodes VALUES ( 4, 'Emp Suggestions' )

SELECT DISTINCT
dt.PersonID,
dt.EventCodeID,
dt.EventCode
FROM @.Events e
RIGHT JOIN (SELECT DISTINCT
e1.PersonID,
ec.EventCodeID,
ec.EventCode
FROM @.Events e1
CROSS JOIN @.EventCodes ec
) dt
ON ( e.EventCodeID = dt.EventCodeID
AND e.PersonID = dt.PersonID

)
WHERE e.PersonID IS NULL
ORDER BY
dt.PersonID,
EventCodeID

PersonID EventCodeID EventCode
-- -- -
6345 2 CDA
6345 4 Emp Suggestions
6345 30 CDL
6345 31 DL
6345 74 Texas Lic
6349 2 CDA
6349 4 Emp Suggestions
6349 74 Texas Lic
6349 75 EMPLOYMENT REF

|||

Here I give both fast & slow one. According to your data-volume which ever fits use it,

Code Snippet

Create Table #event_codes (

[EventCodeIdNo] int primary key ,

[EvenCode] Varchar(100)

);

Insert Into #event_codes Values('13','Criminal-History');

Insert Into #event_codes Values('31','DL');

Insert Into #event_codes Values('30','CDL');

Insert Into #event_codes Values('75','EMPLOYMENT-REF');

Insert Into #event_codes Values('74','Texas-Lic');

Insert Into #event_codes Values('2','CDA');

Insert Into #event_codes Values('4','Emp-Suggestions');

Create Table #events (

[PersonIdNo] int ,

[EventCodeIdNo] int primary key([PersonIdNo],[EventCodeIdNo])

);

Insert Into #events Values('6349','13');

Insert Into #events Values('6349','31');

Insert Into #events Values('6349','30');

Insert Into #events Values('6345','75');

Insert Into #events Values('6345','13');

Code Snippet

--complex & slow one

select

pes.[PersonIdNo],

pes.[EventCodeIdNo],

pes.[EvenCode]

from

#events es

full outer join

(select Distinct es.[PersonIdNo],ec.[EventCodeIdNo] ,ec.[EvenCode]

from #events es

cross join #event_codes ec) as pes

On pes.[PersonIdNo]=es.[PersonIdNo]

and pes.[EventCodeIdNo] = es.[eventcodeidno]

Where

es.[EventCodeIdNo] is null

Code Snippet

--fast & good one

select

from

#event_codes ec

cross join (select distinct [personidno] from #events) p

where

NOT exists

(

select 1 from #events es

where es.[eventcodeidno] = ec.[eventcodeidno]

and es.[personidno]=p.[personidno]

)

|||

Thank you all for your kind suggestions. I went ahead and used Sankar Reddy's suggestions because I have way to many person id and event combos to insert into temp tables. Thanks again for all your suggestions this is a great community!

-Dm

Wednesday, March 28, 2012

Finding Duplicates

Wednesday, March 21, 2012

Find the nonexisting Event

Fragmented array

Blog Archive

About Me