Fragmented array: duplicates

Showing posts with label duplicates. Show all posts

Thursday, March 29, 2012

Finding duplicates in the table

source table AAA
-------

pvn_acct_id
acct_id

no primary key, there can be duplicates, for ex:

pvn_acct_id acct_id
======= =========
111 111
111 111
111 111

222 333
222 334
222 335

333 222
334 222
335 222

from first set, using the below query to extract data, it will result in 111,111, for 2nd set, 222, 335, for third set, 335, 222.

select distinct a.pvn_acct_id, a.acct_id
from owner.AAA where a.acct_id = (select max(b.acct_id) from tstg.t_acct_num_hist b
where a.pvn_acct_id = b.pvn_acct_id)
and a.pvn_acct_id = (select max(c.pvn_acct_id) from tstg.t_acct_num_hist c
where a.acct_id = c.acct_id)

Above query is directly populating the target table BBB

What I want to find out is a sample of data set where i can find different instances of acct_id where pvn_Acct_id is same, and vice versa.
For ex:

I want to know where pvn_acct_id is 222, acct_id is 333,334,335, not just acct_id=335.

pls help.Did you try this:

select distinct a.pvn_acct_id, a.acct_id
from owner.AAA a

;)|||select pvn_Acct_id from owner.AAA aaa where acct_id in (select acct_id from owner.BBB bbb
where aaa.pvn_acct_id != bbb.pvn_acct_id) order by pvn_Acct_id

select Acct_id from owner.AAA aaa where pvn_acct_id in (select pvn_acct_id from owner.BBB bbb
where aaa.acct_id != bbb.acct_id) order by Acct_id

now this query works. got it with trial and error. have fun testing.

Finding duplicates - what on earth am I overlooking?

I have two lists of contacts. They're similar. I want a list of all the contacts whose email address occurs only in the first list.

SELECT COUNT(DISTINCT EMAIL) FROM List1
returns 13460

SELECT COUNT(DISTINCT EMAIL) FROM List2
returns 13220

SELECT EMAIL FROM List1 WHERE EMAIL NOT IN (SELECT DISTINCT EMAIL FROM List2)
returns 0 rows

How can it be returning no rows? What am I failing to take into consideration?

:confused:selecting from lis2 where not in ( list 1 )|||Looks ok to me. This will be more efficient so give it a whirl and see if it sorts out your problem anyway:
SELECT EMAIL
FROM List1
WHERE NOT EXISTS(SELECT * FROM List2 WHERE List2.Email = List1.Email)|||Yup, thanks :) I added a bit to take the nulls out, but this seems to work. At least I get rows returned. I'm still at a loss as to why my first effort didn't work though... :confused:

SELECT EMAIL FROM List1
WHERE NOT EXISTS(SELECT * FROM List2 WHERE List2.email = List1.EMAIL AND List2.email IS NOT NULL)
AND List1.EMAIL IS NOT NULL|||You shouldn't need to explicitly remove NULLS... unless you have turned off ANSI_NULLS.|||I've no idea what state my ansi_nulls are in, but both tables have records where the EMAIL field is null, and if I run the query without the nulls-removing bits, just over half of the 13,000-or-so email addresses returned are NULL. I'm just anticipating the account manager coming back to me with "Who's this NULL bloke and why is he spamming our contacts database?"|||Sorry yes - not thinking through. And good looking blocking that NULL fella ;)

finding duplicates - is this right?

I've done a search and I THINK I've got my head round this, but I'd be very grateful if someone could reassure me:

SELECT Email FROM List1 WHERE EXISTS (
SELECT Email FROM List2 WHERE List2.Email= List1.Email
) AND List1.Email <> '44'

That will give me every email address from list one that (a) appears in list two, and (b) isn't '44'. Right?

And to find all the emails from List1 that DON'T occur in List2 (and aren't '44'), I just put "NOT" in front of "EXISTS". Right?

Sorry for asking an obvious question but I'm having a real mental block here. :oYep, that should give you the results you want!
Here's my 2 stabs at the problem - please correct me if I'm wrong.

SELECT Email
FROM List1
WHERE Email IN
(
SELECT Email
FROM List2
)
AND Email <> '44'

SELECT Email
FROM List1
INNER JOIN List2
ON List1.Email <> List2.Email
WHERE List1.Email <> '44'|||please correct me if I'm wrong.second one is wrong, it's almost a complete cross join|||Oh sorry! I see why it's wrong now - silly me :D
Thanks Rudy ^_^|||Thank you :)

Regarding your first stab - that's what I had for finding the duplicates, but when I tried using that to find the non-duplicates (by sticking "NOT" in front of "IN"), I get no rows returned. I'm still at a loss as to why that would happen - does "NOT" not work with "IN"?|||does "NOT" not work with "IN"?no, it does :)|||Try these methods instead:
select distinct List1.Email
from List1
inner join List2 on List1.Email = List2.Email
where List1.Email <> '44'

select distinct List1.Email
from List1
left outer join List2 on List1.Email = List2.Email
where List1.Email <> '44'
and List2.Email is nullsql

Finding Duplicates

I am trying to complete an insert from query but the problem is I have duplicates, so I'm getting an error message. So to correct it I am creating a Find Duplicates statement in the Query analyzer but Its not working can someone tell me whats wrong with this statement (by the way I'm in SQL 2000 Server)

thank you

SELECT EmployeeGamingLicense [TM#]AS [TM# Field], Count([TM#])AS NumberOfDups
FROM TERMINATION
GROUP BY [TM#]
HAVING Count([TM#])>1;
GOHey Desiree...been a while...

What's this:

EmployeeGamingLicense [TM#]AS [TM# Field],

Doesn't look like one column...how about the DDL for the TERMINATION table...

Or how about

SELECT [TM#], Count([TM#]) AS NumberOfDups
FROM TERMINATION
GROUP BY [TM#]
HAVING Count([TM#])>1|||HI BRETT!!! :) HOw ya Been?

Your right totally over looked that, thank you. Works perfect now. Got another question though, what if I wanted to find duplicate records from one table to another. I would like to be able to see if I can find matching records in two tables One is the TERMINATION.tbl the other EmployeeGamingLicense.tbl. Can I make this into a Stored procedure?

SELECT EmployeeGamingLicense.[TM#], EmployeeGamingLicense.LASTNAME, EmployeeGamingLicense.FIRSTNAME, EmployeeGamingLicense.[SSN#]
FROM EmployeeGamingLicense LEFT JOIN TERMINATION ON EmployeeGamingLicense.[TM#] = TERMINATION.[TM #]
WHERE (((TERMINATION.[TM #]) Is Not Null));|||what if I wanted to find duplicate records from one table to another. I would like to be able to see if I can find matching records in two tables...

I'd have to ask you what that means...

For example...in your previous query, you found dups...

Do you want to see the rows in the License table where you have dups in TERMINATION?

Do you want to see where you have dups because of a join between the 2?

I'm not sure...|||I'm good my statement worked I found the duplicates I needed to find. The thing is I'm Inheriting databases and I need to find the errors with them before I create the Gui for the client. Thank you Brett :)|||So you have to audit the entire db that has a poorly defined schema...

good luck

Any dates or numerics in varchar fields?

Once you clean up the dups...do you plan to put unique constarints on?|||No the dates are in (datetype: smalledatetime) and the Numbers like SSN# is in (datetype: Int) :)|||A little off topic but what do you do with SSN's that begin with 0?|||Anything that does not get math applied to it, is not a number in my book...

You want to constrain it to be numeric...that's fine...otherwise it's char or varchar|||is that ok though for them to be Int and smalldatetime, or do you suggest char or nvarchar|||I agree, we use char(9) in our db's but I've had to fight off developers that want SSN and TIN fields as int datatypes. I hate to have to determine if I need to add leading zeros back to SSN's that are int datatypes or if the SSN is truly corrupt/bad data.|||Ok this is defintely one of those learning processes I desperatley need. I love the feed back forces me to analyze the datatypes more deeply especially for future purposes. Ok so you would reccommend char instead.|||Ok Interger data consist of negative and positive whole numbers such as -15, 0, 5, and 2,509. Ok so INT base data type is a Number in the range from -2,147,483,648 through 2,147483.647 only requires 4 bytes of storage per value). Ok and char can consist of numberics, and can consist of up to 8 KB|||Dates should defenitley stay as dates...why small though?

Is space an issue?

Just a rule of thumb....

Does math need to applied? No? Then it's char or varchar...

Now here's another one...

Are you're dates nullable or not nullable?|||A little off topic but what do you do with SSN's that begin with 0?Worse yet, what do you do for the wierdo's among us with letters in their SSN?

-PatP|||What is that suppose to mean?|||No good question didnt even think about that No space is not an issue so it doesnt have to be smalldatetime can just be datetime datatype instead.

THANK YOU :)

I Love this really makes you think and plan ahead, and with database you need to plan ahead eheheh|||Is anyone as in love with BOOKS ONLINE as I am. :)
and this FORUM OF COURSE|||Just make everything nvarchar(4000) especially since space isn't an issue. That way every column is the same and easier to manage. Better yet text works well too.|||Just make everything nvarchar(4000) especially since space isn't an issue. That way every column is the same and easier to manage. Better yet text works well too.

Now Peter...|||SSNs should be char(9). Listen to Der Kaisermann.

(OK, char(11) would be a second choice...)|||SSNs should be char(9). Listen to Der Kaisermann.

(OK, char(11) would be a second choice...)If you are only storing the kind of SSN/TIN numbers used today, that might be true. If you have to store all of the SSN values, you'll need a VARCHAR(12).

For most purposes, you can get by with only the 9 digit SSN values used today, and an indicator for non-standard ones that you can then describe in more detail for the exceptions in some kind of "notes" column.

-PatP|||What is that suppose to mean?

OK, examples?...but my spider sense says you're right...

Only because peoplesoft has National Id as varchar(12)|||The commonly used SSN values today are 999-99-9999

Immigrant families, circa 1948-1952 are formatted like 999-99-9999#A9, with the first nine digits being the same for all of them.

Families of business owners, circa 1963-1965 were formatted 999-99-9999C99, with the first nine digits being the business owner's SSN.

Although rare, tax paying non-residents are formatted as either aa9-000-99-9999, 999-00-9999AA9, or a few variants. These don't follow much of any rules that I know of, although there might be some rhyme or reason under the surface.

There aren't a bunch of these "problem child" values, but there are enough that they can cause massive headaches.

Then there are the real pain-in-the-patoot types that have earned significant amounts under one of those "oddball" formats, then were given another one of the commonly used variety (999-99-9999) for "administrative convenience" and established enough earnings/etc that neither could be rescinded. Those folks have the distinction of having two active, legitimate SSNs, and may even be paying taxes on both of them at the same time! They can ball up just about anything, since even the packages that support the nonstandard formats have no real way to establish a relationship between them!

-PatP|||Wow, I had to deal with SSN's, but not in the case where a person has more than 1 at any given time... In my case every newborn is given a fake SSN (usually mother's SSN-1) until a name is given to a child (boggles my mind when a mother is carrying a baby for 9 months and at the time of birth has no idea how to call it!!!!!!)|||Pat, I do NOT want to play Trivial Pursuit with you.|||Pat, I do NOT want to play Trivial Pursuit with you.It seems like nobody wants to play with me!

I've got a bunch of great Trivial Pursuit stories. I once won a game where I went 5 for 5, then spent something like 15 turns trying to get the sports one (in a room of sports-aholics).

-PatP|||It seems like nobody wants to play with me!
-PatP

Not even the girlfreind?|||Brett, you don't play Trivial Pursuit with your girlfriend. You play Twister.|||The commonly used SSN values today are 999-99-9999

Immigrant families, circa 1948-1952 are formatted like 999-99-9999#A9, with the first nine digits being the same for all of them.

-PatP

...and here I thought it was something Canadien?!?|||No, Canadian SSNs use the metric system. Everybody knows that.|||If you are only storing the kind of SSN/TIN numbers used today, that might be true. If you have to store all of the SSN values, you'll need a VARCHAR(12).

For most purposes, you can get by with only the 9 digit SSN values used today, and an indicator for non-standard ones that you can then describe in more detail for the exceptions in some kind of "notes" column.

-PatP
I'm a bit confused. My table contains SSNs (all unique) and i've indexed the column, and is often used in WHERE clauses. It's said of that indexes shall not be created on FREE-FORM text columns (varchar usually). Some bad examples in this context are Flag column and SSN column. But since SSN's unique surely one can index it.

CHAR(12) would be better or VARCHAR(12)?

Thanx!|||What? You still here? What were we talking about, again?

Oh yeah. I remember.

"Thou Shalt Not Create Indexes On Character Fields"? I know of no such commandment. They aren't as efficient as indexes on, say, integer columns, but its still much better than not having an index on your character field.

But I'd use varchar unless you want to enforce the length of the string. That's why I originally suggested char(9) or char(11), but as Pat pointed out not ALL social security numbers are the same length.|||WHAT?

I thought you used IDENTITY Columns in every table?|||sorry to for the stupid question but what do you mean By Free-Form what is exactly is that?

Inquisitive by nature :)|||I cant believe we are still on this you guys are so funny

I love it :)|||What? Me use IDENTITY?

I NEVER use IDENTITY columns.

I'm the GUID GUY, remember?

...and by Free-form, he means that users can put whatever text they want in the field. It's not populated from a drop-down list, for instance.|||There is a hard limit to the size of an index key of just over 900 bytes. So you can not index a varchar(1000) field, if you expect to be storing values that are 1000 characters long. If all of the values happen to be less than the limit (I do not know it off hand. I know, bad DBA), then you can get away with it. Of course, the first 1000 character value to come along will hurt you bad.|||What? You still here? What were we talking about, again?

Oh yeah. I remember.

"Thou Shalt Not Create Indexes On Character Fields"? I know of no such commandment. They aren't as efficient as indexes on, say, integer columns, but its still much better than not having an index on your character field.

But I'd use varchar unless you want to enforce the length of the string. That's why I originally suggested char(9) or char(11), but as Pat pointed out not ALL social security numbers are the same length.
Sorry, bear with me plz!
Shouldn't one use an index on a column even if it's unique. SSNs are usually used in the WHERE clauses in applications(in my application just like emp.id). Users in app. access individual records on the basis on SSN. If a table has millions of records, Index Scan or Table Scan?
Stupid question, it is. but doesn't bother me. Your answer may help in clearing the concepts. :cool:|||You may get an Index Seek. But even Indes Scan is better than Table Scan in your situation, because there are much fewer index pages for SSN index of char(9) vs. data pages for your table.|||Sorry, bear with me plz!
Shouldn't one use an index on a column even if it's unique. SSNs are usually used in the WHERE clauses in applications(in my application just like emp.id). Users in app. access individual records on the basis on SSN. If a table has millions of records, Index Scan or Table Scan?
Stupid question, it is. but doesn't bother me. Your answer may help in clearing the concepts. :cool:
Well, if it's truly unique it should have a unique index or unique constraint(which creates a unique index) on the column. Check out this article about index rules in general http://www.sql-server-performance.com/optimizing_indexes.asp|||You may get an Index Seek. But even Indes Scan is better than Table Scan in your situation, because there are much fewer index pages for SSN index of char(9) vs. data pages for your table.

Getting off the floor and into my barstool...umm office chair

What?

You're not suggesting a scan is a good thing? Are you?|||You're not suggesting a scan is a good thing? Are you?An index scan versus a table scan... It could be a good thing, in relative terms.

-PatP

Finding duplicates

Here is what I have so far. Would anyone know if this query picks up all id
numbers that have the same amount on a certain settlment batch number? They
are also both credit sales that were approved. Thanks for any info.
Use WinPayment
GO
SELECT
pos_condition_code,
convert(char(11), retrieval_reference_number) RR,
message_type,
authorization_identification,
convert(char(8), card_acceptor_identification) SN,
convert(char(25), transaction_name) TransactionName,
isnull(convert(char(2), id_code_1), ' ') ID,
convert (char (20), id_number_1)CardNumber,
convert(char(20), time_stamp)POSTime,
convert (char (20), transmission_date_and_time)TransmissionT
ime,
convert(char(2), response_code) RC,
isnull(convert(char(2), host_response_code), '') HRC,
convert(char(20), host_response_string)Message,
convert(char(7), stan) STAN,
convert(char(12), transaction_amount) Amount,
settlement_data
FROM
financial_message as Sale (NOLOCK)
Where
settlement_batch_number = '773'
AND
transaction_name = 'Credit Sale'
And
host_response_string = 'Successful Approval'
AND EXISTS
(Select *
FROM financial_message AS Rev
WHERE Rev.id_number_1 = Sale.id_number_1
AND Rev.transaction_amount = Sale.transaction_amount
AND Rev.settlement_batch_number = sale.settlement_batch_number
AND Rev.transaction_name = sale.transaction_name
And Rev.host_response_string = sale.host_response_string)
order by
time_stampYour filter is based on this
Rev.id_number_1 = Sale.id_number_1
AND Rev.transaction_amount = Sale.transaction_amount
AND Rev.settlement_batch_number = sale.settlement_batch_number
AND Rev.transaction_name = sale.transaction_name
And Rev.host_response_string = sale.host_response_string
Thus, only row(s) that satisfy the requirements would be returned.
-oj
"tarheels4025" <tarheels4025@.discussions.microsoft.com> wrote in message
news:2EF69A8A-0019-4FF7-89BD-A31499D3C8EC@.microsoft.com...
> Here is what I have so far. Would anyone know if this query picks up all
> id
> numbers that have the same amount on a certain settlment batch number?
> They
> are also both credit sales that were approved. Thanks for any info.
> Use WinPayment
> GO
> SELECT
> pos_condition_code,
> convert(char(11), retrieval_reference_number) RR,
> message_type,
> authorization_identification,
> convert(char(8), card_acceptor_identification) SN,
> convert(char(25), transaction_name) TransactionName,
> isnull(convert(char(2), id_code_1), ' ') ID,
> convert (char (20), id_number_1)CardNumber,
> convert(char(20), time_stamp)POSTime,
> convert (char (20), transmission_date_and_time)TransmissionT
ime,
> convert(char(2), response_code) RC,
> isnull(convert(char(2), host_response_code), '') HRC,
> convert(char(20), host_response_string)Message,
> convert(char(7), stan) STAN,
> convert(char(12), transaction_amount) Amount,
> settlement_data
> FROM
> financial_message as Sale (NOLOCK)
> Where
> settlement_batch_number = '773'
> AND
> transaction_name = 'Credit Sale'
> And
> host_response_string = 'Successful Approval'
> AND EXISTS
> (Select *
> FROM financial_message AS Rev
> WHERE Rev.id_number_1 = Sale.id_number_1
> AND Rev.transaction_amount = Sale.transaction_amount
> AND Rev.settlement_batch_number = sale.settlement_batch_number
> AND Rev.transaction_name = sale.transaction_name
> And Rev.host_response_string = sale.host_response_string)
> order by
> time_stamp

Finding duplicates

I wanted to know if there was anyway to pick out duplicates in a certain day
.
Days are setup as a settlement_batch_number.
So on a certain settlement_batch_number is there a way to pick out
duplicates when
store number , card number , and amount are all the same.
Does a group by function accomplish this? Any help would be appreciated.
Thanks.SELECT settlement_batch_number, store_number, card_number, amount
FROM YourTable
GROUP BY settlement_batch_number, store_number, card_number, amount
HAVING COUNT(*)>1
--
David Portas
SQL Server MVP
--|||If I just wanted to specify a certain settlement_batch_number where would I
place that in the statement. Like say 784 was a settlement_batch_number I
wanted where would I put that in the statement. Thanks
"David Portas" wrote:

> SELECT settlement_batch_number, store_number, card_number, amount
> FROM YourTable
> GROUP BY settlement_batch_number, store_number, card_number, amount
> HAVING COUNT(*)>1
> --
> David Portas
> SQL Server MVP
> --
>|||On Mon, 7 Feb 2005 08:37:05 -0800, tarheels4025 wrote:

>If I just wanted to specify a certain settlement_batch_number where would I
>place that in the statement. Like say 784 was a settlement_batch_number I
>wanted where would I put that in the statement. Thanks
Hi tarheels4025,
SELECT settlement_batch_number, store_number, card_number, amount
FROM YourTable
WHERE settlement_batch_number = 784
GROUP BY settlement_batch_number, store_number, card_number, amount
HAVING COUNT(*) > 1
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

Wednesday, March 28, 2012

Finding Duplicates

I have a company table and I would like to write a query that will return to
me any duplicate companies. However, it is a little more complicated then
just matching on exact company names. I would like it to give me duplicates
where x number of letters at the beginning of the company name match AND x
number of letters of the address match AND x number of letters of the city
match. I will be doing this in batches based on the first letter of the
company name. So for example I will first process all companies that start
with the letter "A".

So for all "A" companies I want to find companies where the first 5 letters
in the company name match and the first 5 characters of the address field
match and the first 5 characters of the city match. THANKS!!!Can you post simplified DDLs, some sample data & expected results? For
detail refer to : www.aspfaq.com/5006

--
- Anith
( Please reply to newsgroups only )|||"Erich" <erich93063@.hotmail.com> wrote in message news:<102gbomj7gc84f7@.corp.supernews.com>...
> I have a company table and I would like to write a query that will return to
> me any duplicate companies. However, it is a little more complicated then
> just matching on exact company names. I would like it to give me duplicates
> where x number of letters at the beginning of the company name match AND x
> number of letters of the address match AND x number of letters of the city
> match. I will be doing this in batches based on the first letter of the
> company name. So for example I will first process all companies that start
> with the letter "A".
> So for all "A" companies I want to find companies where the first 5 letters
> in the company name match and the first 5 characters of the address field
> match and the first 5 characters of the city match. THANKS!!!

Something like this may work:

select t.*
from dbo.MyTable t
join
(
select
left(CompanyName, 5) as 'CompName',
left(Address, 5) as 'Addr',
left(City, 5) as 'City',
count(*) as 'Dupes'
from
dbo.MyTable
where
left(CompanyName, 1) = 'A'
group by
left(CompanyName, 5),
left(Address, 5),
left(City, 5)
having
count(*) > 1
) dt
on dt.CompName = left(t.CompanyName, 5)
and dt.Addr = left(t.Address, 5)
and dt.City = left(t.City, 5)

If this doesn't work as you expect, then please consider posting your
table DDL, as well as some sample data.

Simon|||This may work as well, assuming a parameter is passed into the stored proc
or function for the first letter:

SELECT C1.CompanyID, C2.CompanyID,
C1.CompanyName, C2.CompanyName,
C1.Address, C2.Address, C1.City, C2.City
FROM Company C1 JOIN Company C2 ON
LEFT(C1.CompanyName, 5) = LEFT(C2.CompanyName, 5) AND
LEFT(C1.Address, 5) = LEFT(C2.Address, 5) AND
LEFT(C1.City, 5) = LEFT(C2.City, 5) AND
C1.CompanyID != C2.CompanyID
WHERE LEFT(C1.CompanyName, 1) = @.FirstLetter

You could also use a variable parameter instead of hard-coding "5" to allow
for more specific or more general matches.
ie, ... LEFT(C1.Company, @.MatchLength) = LEFT(C2.CompanyName, @.MatchLength)

"Erich" <erich93063@.hotmail.com> wrote in message
news:102gbomj7gc84f7@.corp.supernews.com...
> I have a company table and I would like to write a query that will return
to
> me any duplicate companies. However, it is a little more complicated then
> just matching on exact company names. I would like it to give me
duplicates
> where x number of letters at the beginning of the company name match AND x
> number of letters of the address match AND x number of letters of the city
> match. I will be doing this in batches based on the first letter of the
> company name. So for example I will first process all companies that start
> with the letter "A".
> So for all "A" companies I want to find companies where the first 5
letters
> in the company name match and the first 5 characters of the address field
> match and the first 5 characters of the city match. THANKS!!!

Finding Duplicates

I have a pretty extensive database system I am querying, though what I am
looking for should be easy to find. I am looking for instances in the
Clients table that appear more than once (duplicate clients). I am using the
following:
Select First_Name + ' ' + Last_Name as Client,
Count(SSNum) as Count
From Clients
Group By First_Name, Last_Name, SSNum
Having Count(SSNum)>=2
Where I am confused is this; if I remove the Group By SSNum I have 126
instances. With the Group By SSNum I have 56. When I link in the account
table to get the site name and worker name I have 14 instances. Is there
another more reliable way to tell what names and ssn's happen more than
once?
TIA"John C Harris" <harris1113@.tampabay.rr.com> wrote in message
news:4416EDF4-0119-45CC-87A9-D23E2EEFB63B@.microsoft.com...
>I have a pretty extensive database system I am querying, though what I am
>looking for should be easy to find. I am looking for instances in the
>Clients table that appear more than once (duplicate clients). I am using
>the following:
> Select First_Name + ' ' + Last_Name as Client,
> Count(SSNum) as Count
> From Clients
> Group By First_Name, Last_Name, SSNum
> Having Count(SSNum)>=2
> Where I am confused is this; if I remove the Group By SSNum I have 126
> instances. With the Group By SSNum I have 56. When I link in the account
> table to get the site name and worker name I have 14 instances. Is there
> another more reliable way to tell what names and ssn's happen more than
> once?
> TIA
>
If you include SSNum in the GROUP BY then you will get one row for every
unique (First_Name, Last_Name, SSNum) that is duplicated.
If you don't include SSNum in the GROUP BY then you will get one row for
every unique (First_Name, Last_Name) that is duplicated.
Both methods are perfectly reliable but they tell you different things. It
all depends on what answer you want.
I guess the problem with the JOIN version is that you are doing an INNER
JOIN that is eliminating some rows. Difficult to say without seeing the
code.
Please post DDL, sample data and required results if you need more help.
--
David Portas|||Ah OK. I think I have it. Does this make sense:
I run the code below and get 56 names and ssns duplicated
I add the Worker ID and get 14 names
I add the Site Name and also get 14 names
So this is basically telling me that I have 56 clients in the table that
match more than once on name and SSN, but by adding the site and worker that
goes down to 14, meaning that I have 14 duplicates with the same name AND
same site/worker. So from that I can assume of the 56 names, 42 of them are
in more than one site, but only once in those sites?
That actually makes sense in looking through the raw data, because it
appears the same client went to more than one site.
"John C Harris" <harris1113@.tampabay.rr.com> wrote in message
news:4416EDF4-0119-45CC-87A9-D23E2EEFB63B@.microsoft.com...
>I have a pretty extensive database system I am querying, though what I am
>looking for should be easy to find. I am looking for instances in the
>Clients table that appear more than once (duplicate clients). I am using
>the following:
> Select First_Name + ' ' + Last_Name as Client,
> Count(SSNum) as Count
> From Clients
> Group By First_Name, Last_Name, SSNum
> Having Count(SSNum)>=2
> Where I am confused is this; if I remove the Group By SSNum I have 126
> instances. With the Group By SSNum I have 56. When I link in the account
> table to get the site name and worker name I have 14 instances. Is there
> another more reliable way to tell what names and ssn's happen more than
> once?
> TIA
>sql

finding duplicates

quick easy querstion:
I have an inventory table in a datbase and each record has
a serial number. what is the sql code to find duplicates.
tiaSee following example:
create table #cartype(manufacturer varchar(500), score int)
insert into #cartype values('Toyota', 1 )
insert into #cartype values('Toyota', 2)
insert into #cartype values('Toyota', 3)
insert into #cartype values('Honda', 2)
insert into #cartype values('Honda', 5)
insert into #cartype values('Honda', 8)
insert into #cartype values('GE', 1)
insert into #cartype values('GE', 3)
insert into #cartype values('GE', 9)
insert into #cartype values('GE1', 6)
Following query will give you manufacturers that are appearing more than
once.
select manufacturer from #cartype
group by manufacturer
having count(*) > 1
--Following is the query to get the complete row details
select a.* from #cartype a inner join
(select manufacturer from #cartype
group by manufacturer
having count(*) > 1) b
on a.manufacturer = b.manufacturer
--
-Vishal
"katie" <kari@.wom3c.com> wrote in message
news:070a01c3505f$74b74970$a001280a@.phx.gbl...
> quick easy querstion:
> I have an inventory table in a datbase and each record has
> a serial number. what is the sql code to find duplicates.
> tia|||Katie,
SELECT serial_number, COUNT(serial_number) [Count]
FROM dbo.inventory
GROUP BY serial_number
HAVING COUNT(serial_number) > 1
--
Dinesh.
SQL Server FAQ at
http://www.tkdinesh.com
"katie" <kari@.wom3c.com> wrote in message
news:070a01c3505f$74b74970$a001280a@.phx.gbl...
> quick easy querstion:
> I have an inventory table in a datbase and each record has
> a serial number. what is the sql code to find duplicates.
> tia|||SELECT serial_number FROM inventory
GROUP BY serial_number
HAVING COUNT(*) > 1
And if you want the actual rows:
SELECT * FROM inventory i
INNER JOIN (
SELECT serial_number FROM inventory
GROUP BY serial_number
HAVING COUNT(*) > 1) AS j
ON i.serial_number = j.serial_number
Jacco Schalkwijk MCDBA, MCSD, MCSE
Database Administrator
Eurostop Ltd.
"katie" <kari@.wom3c.com> wrote in message
news:070a01c3505f$74b74970$a001280a@.phx.gbl...
> quick easy querstion:
> I have an inventory table in a datbase and each record has
> a serial number. what is the sql code to find duplicates.
> tia|||not very sure but,
I think you are using wrong column name, make sure all the column names that
you use in the query exists in the table.(may be some spelling mistake)
--
-Vishal
"katie" <kadf@.hsd.com> wrote in message
news:00c801c35062$c8f3e4f0$a101280a@.phx.gbl...
> i am doing this in access and it asks me to put a
> parameter value in. what does this mean
> >--Original Message--
> >Katie,
> >
> >SELECT serial_number, COUNT(serial_number) [Count]
> >FROM dbo.inventory
> >GROUP BY serial_number
> >HAVING COUNT(serial_number) > 1
> >
> >--
> >Dinesh.
> >SQL Server FAQ at
> >http://www.tkdinesh.com
> >
> >"katie" <kari@.wom3c.com> wrote in message
> >news:070a01c3505f$74b74970$a001280a@.phx.gbl...
> >> quick easy querstion:
> >> I have an inventory table in a datbase and each record
> has
> >> a serial number. what is the sql code to find
> duplicates.
> >>
> >> tia
> >
> >
> >.
> >|||I think square brackets are used in Access to indicate a parameter, so if
you leave them of you should be ok. Note that I last used Access quite a few
years ago, so don't bank on it :-)
--
Jacco Schalkwijk MCDBA, MCSD, MCSE
Database Administrator
Eurostop Ltd.
"Vishal Parkar" <vgparkar@.hotmail.com> wrote in message
news:u7VqUTGUDHA.1912@.tk2msftngp13.phx.gbl...
> not very sure but,
> I think you are using wrong column name, make sure all the column names
that
> you use in the query exists in the table.(may be some spelling mistake)
> --
> -Vishal
> "katie" <kadf@.hsd.com> wrote in message
> news:00c801c35062$c8f3e4f0$a101280a@.phx.gbl...
> > i am doing this in access and it asks me to put a
> > parameter value in. what does this mean
> > >--Original Message--
> > >Katie,
> > >
> > >SELECT serial_number, COUNT(serial_number) [Count]
> > >FROM dbo.inventory
> > >GROUP BY serial_number
> > >HAVING COUNT(serial_number) > 1
> > >
> > >--
> > >Dinesh.
> > >SQL Server FAQ at
> > >http://www.tkdinesh.com
> > >
> > >"katie" <kari@.wom3c.com> wrote in message
> > >news:070a01c3505f$74b74970$a001280a@.phx.gbl...
> > >> quick easy querstion:
> > >> I have an inventory table in a datbase and each record
> > has
> > >> a serial number. what is the sql code to find
> > duplicates.
> > >>
> > >> tia
> > >
> > >
> > >.
> > >
>

Monday, March 26, 2012

finding and removing duplicates rows

Say I had a table called 'RAS', with 4 columns:
Date, Time, UserName, GroupName
How can I check for any duplicate rows based on all those columns
combined.
And once they are found, can the duplicates be deleted?
It would be nice to see 2 solutions for learning purposes... one
statement just to show the duplicates, and another separate statement
to find and delete the duplicates.
Thanks in advance!
StephenCREATE TABLE #Demo (
idNo int identity(1,1),
colA int,
colB int
)
INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (2,4)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (4,2)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (5,1)
INSERT INTO #Demo(colA,colB) VALUES (8,1)
PRINT 'Table'
SELECT * FROM #Demo
PRINT 'Duplicates in Table'
SELECT * FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo <> B.idNo
AND A.colA = B.colA
AND A.colB = B.colB)
PRINT 'Duplicates to Delete'
SELECT * FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo < B.idNo -- < this time, not <>
AND A.colA = B.colA
AND A.colB = B.colB)
DELETE FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo < B.idNo -- < this time, not <>
AND A.colA = B.colA
AND A.colB = B.colB)
PRINT 'Cleaned-up Table'
SELECT * FROM #Demo
DROP TABLE #Demo
"Stephen" <facetoface72@.hotmail.com> wrote in message
news:d6e21d67.0403160535.5ac3afa7@.posting.google.com...
> Say I had a table called 'RAS', with 4 columns:
> Date, Time, UserName, GroupName
> How can I check for any duplicate rows based on all those columns
> combined.
> And once they are found, can the duplicates be deleted?
> It would be nice to see 2 solutions for learning purposes... one
> statement just to show the duplicates, and another separate statement
> to find and delete the duplicates.
> Thanks in advance!
> Stephen|||Thanks for the message, it definately helped solve a lot of issues.
Is there a way to do everything mentioned, but without having a unique
key for each record?
"Uri Dimant" <urid@.iscar.co.il> wrote in message news:<#6uUEw1CEHA.1452@.TK2MSFTNGP09.phx.gb
l>...
> CREATE TABLE #Demo (
> idNo int identity(1,1),
> colA int,
> colB int
> )
> INSERT INTO #Demo(colA,colB) VALUES (1,6)
> INSERT INTO #Demo(colA,colB) VALUES (1,6)
> INSERT INTO #Demo(colA,colB) VALUES (2,4)
> INSERT INTO #Demo(colA,colB) VALUES (3,3)
> INSERT INTO #Demo(colA,colB) VALUES (4,2)
> INSERT INTO #Demo(colA,colB) VALUES (3,3)
> INSERT INTO #Demo(colA,colB) VALUES (5,1)
> INSERT INTO #Demo(colA,colB) VALUES (8,1)
> PRINT 'Table'
> SELECT * FROM #Demo
> PRINT 'Duplicates in Table'
> SELECT * FROM #Demo
> WHERE idNo IN
> (SELECT B.idNo
> FROM #Demo A JOIN #Demo B
> ON A.idNo <> B.idNo
> AND A.colA = B.colA
> AND A.colB = B.colB)
> PRINT 'Duplicates to Delete'
> SELECT * FROM #Demo
> WHERE idNo IN
> (SELECT B.idNo
> FROM #Demo A JOIN #Demo B
> ON A.idNo < B.idNo -- < this time, not <>
> AND A.colA = B.colA
> AND A.colB = B.colB)
> DELETE FROM #Demo
> WHERE idNo IN
> (SELECT B.idNo
> FROM #Demo A JOIN #Demo B
> ON A.idNo < B.idNo -- < this time, not <>
> AND A.colA = B.colA
> AND A.colB = B.colB)
> PRINT 'Cleaned-up Table'
> SELECT * FROM #Demo
> DROP TABLE #Demo
> "Stephen" <facetoface72@.hotmail.com> wrote in message
> news:d6e21d67.0403160535.5ac3afa7@.posting.google.com...|||On 17 Mar 2004 05:24:49 -0800, Stephen wrote:

>Thanks for the message, it definately helped solve a lot of issues.
>Is there a way to do everything mentioned, but without having a unique
>key for each record?
No. If all columns in two rows are the same, there's no way to tell
them apart. Any where clause will either select both rows or none of
them. If you want to eliminate duplicates, you have to add an IDENTITY
or GUID column, or else you will delete both rows instead of one of
the duplicates.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||DELETE *
FROM table3
WHERE sno not in (SELECT min(sno)
FROM table3
GROUP BY fd1, fd2, fd3);

Wednesday, March 7, 2012

find duplicates

What is the query needed to find the records that has a duplicate in a
certain field?
Thanks
You need to GROUP BY the column(s) you are searching for duplicates that have
a count > 1 (ie. HAVING COUNT(*) > 1). For example identify any duplicate
lastnames:
CREATE TABLE dbo.dups
(
FirstName VARCHAR(20),
LastName VARCHAR(20)
)
INSERT dbo.dups SELECT 'Peter', 'Ward'
INSERT dbo.dups SELECT 'John', 'Smith'
INSERT dbo.dups SELECT 'Jeff', 'Brown'
INSERT dbo.dups SELECT 'Dave', 'Smith'
INSERT dbo.dups SELECT 'Alex', 'Moon'
SELECT lastname
FROM dbo.dups
GROUP BY lastname
HAVING COUNT(lastname) > 1
HTH
- Peter Ward
WARDY IT Solutions
"Johnfli" wrote:

> What is the query needed to find the records that has a duplicate in a
> certain field?
>
> Thanks
>
>

Find duplicates

I have a query that I used that was based on the following query that Hugo
gave me yesterday (I added CheckNumber in for the problem).
I had a problem with finding duplicate check numbers in the table
(voids,re-issued checks etc). It does show in my results, but out of
4300+checks it is hard to find all the duplicates for auditing purposes
while setting up my script.
How do I go about changing my query to show ONLY duplicates (or vice versa)?
Here is the data:
drop table EmployeePay
drop table PayDetail
CREATE TABLE [dbo].[EmployeePay] (
[EmployeePayID] [int] IDENTITY (1, 1),
[ClientID] [int] Not NULL ,
[CheckNumber] [int] Not Null,
[PayDate] [smalldatetime] Not NULL,
PRIMARY KEY (EmployeePayID)
) ON [PRIMARY]
CREATE TABLE [dbo].[PayDetail] (
[PayDetailID] [int] IDENTITY (1, 1),
[ClientID] [int] Not NULL ,
[PayDate] [smalldatetime] Not NULL ,
[Code] [varchar] (5) Not NULL ,
[Hours] [int] Not NULL ,
[Amount] [money] Not NULL,
PRIMARY KEY (PayDetailID)
) ON [PRIMARY]
insert EmployeePay values (1,1234,'01-01-05')
insert EmployeePay values (1,2255,'02-03-05')
insert EmployeePay values (2,1331,'01-01-05')
insert EmployeePay values (2,3011,'03-06-05')
insert EmployeePay values (2,1331,'05-06-05')
insert PayDetail values (1,'01-01-05','5',20,200.50)
insert PayDetail values (1,'01-01-05','6',25,10.50)
insert PayDetail values (1,'01-01-05','13',50,120.25)
insert PayDetail values (1,'02-03-05','5',5,110)
insert PayDetail values (1,'02-03-05','9',18,250.50)
insert PayDetail values (2,'01-01-05','5',50,120.25)
insert PayDetail values (2,'01-01-05','44',10,320.32)
insert PayDetail values (2,'01-01-05','32',50,120.25)
insert PayDetail values (2,'03-06-05','46',18,235.75)
insert PayDetail values (2,'05-06-05','5',-50,-120.25)
insert PayDetail values (2,'05-06-05','44',-10,-320.32)
insert PayDetail values (2,'05-06-05','32',-50,-120.25)
and the script I am using:
CREATE VIEW PayDetailWithRank
AS
SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount,
(SELECT COUNT(*)
FROM PayDetail AS pd2
WHERE pd2.ClientID = pd1.ClientID
AND pd2.PayDate = pd1.PayDate
AND pd2.Code < pd1.Code) AS Rank
FROM PayDetail AS pd1
go
SELECT e.ClientID, e.PayDate, e.CheckNumber,
d1.Code, d1.Hours, d1.Amount,
d2.Code, d2.Hours, d2.Amount,
d3.Code, d3.Hours, d3.Amount
FROM EmployeePay AS e
INNER JOIN PayDetailWithRank AS d1
ON d1.ClientID = e.ClientID
AND d1.PayDate = e.PayDate
AND d1.Rank % 3 = 0
LEFT JOIN PayDetailWithRank AS d2
ON d2.ClientID = e.ClientID
AND d2.PayDate = e.PayDate
AND d2.Rank / 3 = d1.Rank / 3
AND d2.Rank % 3 = 1
LEFT JOIN PayDetailWithRank AS d3
ON d3.ClientID = e.ClientID
AND d3.PayDate = e.PayDate
AND d3.Rank / 3 = d1.Rank / 3
AND d3.Rank % 3 = 2
ORDER BY e.ClientID, e.PayDate, d1.Rank
go
DROP VIEW PayDetailWithRank
go
This will give you 1 duplicate check. Can I change this to show only
duplicate, somehow?
Thanks,
TomNever mind.
I figure out how. I have to modify the statement slightly
I need to change the Order by to Group By and add a "Having Count(*) > 1"
clause.
Tom
"tshad" <tscheiderich@.ftsolutions.com> wrote in message
news:%23tVBdqhuFHA.1472@.TK2MSFTNGP15.phx.gbl...
>I have a query that I used that was based on the following query that Hugo
>gave me yesterday (I added CheckNumber in for the problem).
> I had a problem with finding duplicate check numbers in the table
> (voids,re-issued checks etc). It does show in my results, but out of
> 4300+checks it is hard to find all the duplicates for auditing purposes
> while setting up my script.
> How do I go about changing my query to show ONLY duplicates (or vice
> versa)?
> Here is the data:
> drop table EmployeePay
> drop table PayDetail
> CREATE TABLE [dbo].[EmployeePay] (
> [EmployeePayID] [int] IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [CheckNumber] [int] Not Null,
> [PayDate] [smalldatetime] Not NULL,
> PRIMARY KEY (EmployeePayID)
> ) ON [PRIMARY]
> CREATE TABLE [dbo].[PayDetail] (
> [PayDetailID] [int] IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [PayDate] [smalldatetime] Not NULL ,
> [Code] [varchar] (5) Not NULL ,
> [Hours] [int] Not NULL ,
> [Amount] [money] Not NULL,
> PRIMARY KEY (PayDetailID)
> ) ON [PRIMARY]
> insert EmployeePay values (1,1234,'01-01-05')
> insert EmployeePay values (1,2255,'02-03-05')
> insert EmployeePay values (2,1331,'01-01-05')
> insert EmployeePay values (2,3011,'03-06-05')
> insert EmployeePay values (2,1331,'05-06-05')
> insert PayDetail values (1,'01-01-05','5',20,200.50)
> insert PayDetail values (1,'01-01-05','6',25,10.50)
> insert PayDetail values (1,'01-01-05','13',50,120.25)
> insert PayDetail values (1,'02-03-05','5',5,110)
> insert PayDetail values (1,'02-03-05','9',18,250.50)
> insert PayDetail values (2,'01-01-05','5',50,120.25)
> insert PayDetail values (2,'01-01-05','44',10,320.32)
> insert PayDetail values (2,'01-01-05','32',50,120.25)
> insert PayDetail values (2,'03-06-05','46',18,235.75)
> insert PayDetail values (2,'05-06-05','5',-50,-120.25)
> insert PayDetail values (2,'05-06-05','44',-10,-320.32)
> insert PayDetail values (2,'05-06-05','32',-50,-120.25)
>
> and the script I am using:
> CREATE VIEW PayDetailWithRank
> AS
> SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount,
> (SELECT COUNT(*)
> FROM PayDetail AS pd2
> WHERE pd2.ClientID = pd1.ClientID
> AND pd2.PayDate = pd1.PayDate
> AND pd2.Code < pd1.Code) AS Rank
> FROM PayDetail AS pd1
> go
> SELECT e.ClientID, e.PayDate, e.CheckNumber,
> d1.Code, d1.Hours, d1.Amount,
> d2.Code, d2.Hours, d2.Amount,
> d3.Code, d3.Hours, d3.Amount
> FROM EmployeePay AS e
> INNER JOIN PayDetailWithRank AS d1
> ON d1.ClientID = e.ClientID
> AND d1.PayDate = e.PayDate
> AND d1.Rank % 3 = 0
> LEFT JOIN PayDetailWithRank AS d2
> ON d2.ClientID = e.ClientID
> AND d2.PayDate = e.PayDate
> AND d2.Rank / 3 = d1.Rank / 3
> AND d2.Rank % 3 = 1
> LEFT JOIN PayDetailWithRank AS d3
> ON d3.ClientID = e.ClientID
> AND d3.PayDate = e.PayDate
> AND d3.Rank / 3 = d1.Rank / 3
> AND d3.Rank % 3 = 2
> ORDER BY e.ClientID, e.PayDate, d1.Rank
> go
> DROP VIEW PayDetailWithRank
> go
>
> This will give you 1 duplicate check. Can I change this to show only
> duplicate, somehow?
> Thanks,
> Tom
>|||Hi
Changing the first left outer join to be an inner join will ensure at least
one duplicate.
SELECT e.ClientID, e.PayDate, e.CheckNumber,
d1.Code, d1.Hours, d1.Amount, d1.rank,
d2.Code, d2.Hours, d2.Amount, d2.rank,
d3.Code, d3.Hours, d3.Amount, d3.rank
FROM EmployeePay AS e
JOIN PayDetailWithRank AS d1
ON d1.ClientID = e.ClientID
AND d1.PayDate = e.PayDate
AND d1.Rank % 3 = 0
JOIN PayDetailWithRank AS d2
ON d2.ClientID = e.ClientID
AND d2.PayDate = e.PayDate
AND d2.Rank / 3 = d1.Rank / 3
AND d2.Rank % 3 = 1
LEFT JOIN PayDetailWithRank AS d3
ON d3.ClientID = e.ClientID
AND d3.PayDate = e.PayDate
AND d3.Rank / 3 = d1.Rank / 3
AND d3.Rank % 3 = 2
ORDER BY e.ClientID, e.PayDate, d1.Rank
What are you expecting if there are 4 duplicates?
John
"tshad" <tscheiderich@.ftsolutions.com> wrote in message
news:%23tVBdqhuFHA.1472@.TK2MSFTNGP15.phx.gbl...
>I have a query that I used that was based on the following query that Hugo
>gave me yesterday (I added CheckNumber in for the problem).
> I had a problem with finding duplicate check numbers in the table
> (voids,re-issued checks etc). It does show in my results, but out of
> 4300+checks it is hard to find all the duplicates for auditing purposes
> while setting up my script.
> How do I go about changing my query to show ONLY duplicates (or vice
> versa)?
> Here is the data:
> drop table EmployeePay
> drop table PayDetail
> CREATE TABLE [dbo].[EmployeePay] (
> [EmployeePayID] [int] IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [CheckNumber] [int] Not Null,
> [PayDate] [smalldatetime] Not NULL,
> PRIMARY KEY (EmployeePayID)
> ) ON [PRIMARY]
> CREATE TABLE [dbo].[PayDetail] (
> [PayDetailID] [int] IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [PayDate] [smalldatetime] Not NULL ,
> [Code] [varchar] (5) Not NULL ,
> [Hours] [int] Not NULL ,
> [Amount] [money] Not NULL,
> PRIMARY KEY (PayDetailID)
> ) ON [PRIMARY]
> insert EmployeePay values (1,1234,'01-01-05')
> insert EmployeePay values (1,2255,'02-03-05')
> insert EmployeePay values (2,1331,'01-01-05')
> insert EmployeePay values (2,3011,'03-06-05')
> insert EmployeePay values (2,1331,'05-06-05')
> insert PayDetail values (1,'01-01-05','5',20,200.50)
> insert PayDetail values (1,'01-01-05','6',25,10.50)
> insert PayDetail values (1,'01-01-05','13',50,120.25)
> insert PayDetail values (1,'02-03-05','5',5,110)
> insert PayDetail values (1,'02-03-05','9',18,250.50)
> insert PayDetail values (2,'01-01-05','5',50,120.25)
> insert PayDetail values (2,'01-01-05','44',10,320.32)
> insert PayDetail values (2,'01-01-05','32',50,120.25)
> insert PayDetail values (2,'03-06-05','46',18,235.75)
> insert PayDetail values (2,'05-06-05','5',-50,-120.25)
> insert PayDetail values (2,'05-06-05','44',-10,-320.32)
> insert PayDetail values (2,'05-06-05','32',-50,-120.25)
>
> and the script I am using:
> CREATE VIEW PayDetailWithRank
> AS
> SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount,
> (SELECT COUNT(*)
> FROM PayDetail AS pd2
> WHERE pd2.ClientID = pd1.ClientID
> AND pd2.PayDate = pd1.PayDate
> AND pd2.Code < pd1.Code) AS Rank
> FROM PayDetail AS pd1
> go
> SELECT e.ClientID, e.PayDate, e.CheckNumber,
> d1.Code, d1.Hours, d1.Amount,
> d2.Code, d2.Hours, d2.Amount,
> d3.Code, d3.Hours, d3.Amount
> FROM EmployeePay AS e
> INNER JOIN PayDetailWithRank AS d1
> ON d1.ClientID = e.ClientID
> AND d1.PayDate = e.PayDate
> AND d1.Rank % 3 = 0
> LEFT JOIN PayDetailWithRank AS d2
> ON d2.ClientID = e.ClientID
> AND d2.PayDate = e.PayDate
> AND d2.Rank / 3 = d1.Rank / 3
> AND d2.Rank % 3 = 1
> LEFT JOIN PayDetailWithRank AS d3
> ON d3.ClientID = e.ClientID
> AND d3.PayDate = e.PayDate
> AND d3.Rank / 3 = d1.Rank / 3
> AND d3.Rank % 3 = 2
> ORDER BY e.ClientID, e.PayDate, d1.Rank
> go
> DROP VIEW PayDetailWithRank
> go
>
> This will give you 1 duplicate check. Can I change this to show only
> duplicate, somehow?
> Thanks,
> Tom
>|||>> had a problem with finding duplicate check numbers in the table (voids,r
e-issued checks etc). <<
I have a better question for you: why are you creating duplicate check
numbers? Do you think that perhaps not having a proper relational key
could be part of the problem?
If you will learn to write proper DDL, you will save a lot of complex
DML.
Also, look up the ISO-8601 format for dates and times.|||"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1126814005.385526.122020@.z14g2000cwz.googlegroups.com...
> I have a better question for you: why are you creating duplicate check
> numbers? Do you think that perhaps not having a proper relational key
> could be part of the problem?
As I had mentioned, this is not my database. I built a table to mimic that
tables I am extracting information from to create a CSV file that I can use
to import into our system. I am not creating Duplicate Check numbers - that
is what is in their system and I have to deal with it. I have over 5000
checks and as it turned out 5 duplicate check numbers. I used my group
by/having solutions to find them. As I mentioned, they were voids (hence
the duplicate check number) as well as a check that was re-issued (we don't
know why yet).
Are you suggesting I should spend my time creating a PROPER database schema,
even if it doesn't mirror what I am trying to accomplish?
And BTW, you have pounded incessantly about the evils of the IDENTITY.
I for one am grateful they used it, as you would have seen in my other post.
Hugos solution with the Views worked great, but hadn't taken into account
multiple codes on one check (which there would have been no way for him to
know this). I found that their tables used identities and that was the only
field I could replace the Code field with to make it work. It had to be
sequential and unique - Gaps would not have been an issue.
Tom
> If you will learn to write proper DDL, you will save a lot of complex
> DML.
> Also, look up the ISO-8601 format for dates and times.
>|||On Thu, 15 Sep 2005 13:40:38 -0700, tshad wrote:
(snip)
>Hugos solution with the Views worked great, but hadn't taken into account
>multiple codes on one check (which there would have been no way for him to
>know this). I found that their tables used identities and that was the onl
y
>field I could replace the Code field with to make it work. It had to be
>sequential and unique - Gaps would not have been an issue.
Hi Tom,
It would also have been solveable without the identity column. If there
was no single column to make each row unique within Client/Paydate, I'd
have used a combination of two columns. Or three. Or ... well, you get
the point. The code would have been more messy, but it would have
worked.
The only thing I would not have been able to handle were complete
duplicates (i.e. two or more rows with the same value in each of their
columns). But if you had those, you'd have much bigger problems to solve
than generating a kind-of-crosstab report. <g>
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:5oqji11o59rufd33auh0psimi867s65rj7@.
4ax.com...
> On Thu, 15 Sep 2005 13:40:38 -0700, tshad wrote:
> (snip)
> Hi Tom,
> It would also have been solveable without the identity column. If there
> was no single column to make each row unique within Client/Paydate, I'd
> have used a combination of two columns. Or three. Or ... well, you get
> the point. The code would have been more messy, but it would have
> worked.
I agree.
But in this case, there was no other set of columns (that I could make out)
that would have solved the problem. I am not sure you could have made
multiple columns work with a surrogate key, in this case. All the tables
(earnings, taxes and deductions) all could have multiple records with the
same code, date, employee number (SSN) etc on the same check. That was what
I was looking for, I just happen to notice that they have an identity field
there and it solved the problem.
Thanks,
Tom
> The only thing I would not have been able to handle were complete
> duplicates (i.e. two or more rows with the same value in each of their
> columns). But if you had those, you'd have much bigger problems to solve
> than generating a kind-of-crosstab report. <g>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)

find duplicates

What is the query needed to find the records that has a duplicate in a
certain field?
ThanksYou need to GROUP BY the column(s) you are searching for duplicates that have
a count > 1 (ie. HAVING COUNT(*) > 1). For example identify any duplicate
lastnames:
CREATE TABLE dbo.dups
(
FirstName VARCHAR(20),
LastName VARCHAR(20)
)
INSERT dbo.dups SELECT 'Peter', 'Ward'
INSERT dbo.dups SELECT 'John', 'Smith'
INSERT dbo.dups SELECT 'Jeff', 'Brown'
INSERT dbo.dups SELECT 'Dave', 'Smith'
INSERT dbo.dups SELECT 'Alex', 'Moon'
SELECT lastname
FROM dbo.dups
GROUP BY lastname
HAVING COUNT(lastname) > 1
HTH
- Peter Ward
WARDY IT Solutions
"Johnfli" wrote:
> What is the query needed to find the records that has a duplicate in a
> certain field?
>
> Thanks
>
>

find duplicates

What is the query needed to find the records that has a duplicate in a
certain field?
ThanksYou need to GROUP BY the column(s) you are searching for duplicates that hav
e
a count > 1 (ie. HAVING COUNT(*) > 1). For example identify any duplicate
lastnames:
CREATE TABLE dbo.dups
(
FirstName VARCHAR(20),
LastName VARCHAR(20)
)
INSERT dbo.dups SELECT 'Peter', 'Ward'
INSERT dbo.dups SELECT 'John', 'Smith'
INSERT dbo.dups SELECT 'Jeff', 'Brown'
INSERT dbo.dups SELECT 'Dave', 'Smith'
INSERT dbo.dups SELECT 'Alex', 'Moon'
SELECT lastname
FROM dbo.dups
GROUP BY lastname
HAVING COUNT(lastname) > 1
HTH
- Peter Ward
WARDY IT Solutions
"Johnfli" wrote:

> What is the query needed to find the records that has a duplicate in a
> certain field?
>
> Thanks
>
>

Thursday, March 29, 2012

Finding duplicates in the table

Finding duplicates - what on earth am I overlooking?

finding duplicates - is this right?

Finding Duplicates

Finding duplicates

Finding duplicates

Wednesday, March 28, 2012

Finding Duplicates

Finding Duplicates

finding duplicates

Monday, March 26, 2012

finding and removing duplicates rows

Wednesday, March 7, 2012

find duplicates

Find duplicates

find duplicates

find duplicates

Fragmented array

Blog Archive

About Me