Sunday, February 26, 2012

Find and delete reversed key pairs.

Hi!

I have a table with only two columns, which contain some city codes in the form of AAA, BBB. What I need to do is to find reversed key pairs in the same table and get rid of them so only one pair stays. In example, I have:

AAA, BBB

CCC, DDD

BBB, AAA

EEE, FFF

FFF, EEE

DDD, CCC

AAA, BBB and BBB, AAA mean the same to me, so I would like to leave only one of them. Do you have any idea on how to achieve this? I tried to use a cursor (yeah, I know cursors are bad, but this is a task I will most likely run once or every few weeks), but the table has over 130k records and it looks like I need at least a few hours for the cursor to complete it's work. Here's what I did:

Code Snippet

declare
@.origCity char(3)
,@.destCity char(3)

declare crs_cities cursor local fast_forward for
select
origCity, destCity
from
dbo.airports
order by
origCity

open crs_cities

fetch next from crs_cities
into @.origCity, @.destCity

while @.@.fetch_status = 0
begin
-- Delete rows having the same city pair in reverse order.
delete from
dbo.airports
where
destCity = @.origCity
and
origCity = @.destCity


fetch next from crs_cities
into @.origCity, @.destCity
end

close crs_cities
deallocate crs_cities


I could also add an autoincrement column before the pairs and use a while loop instead of a cursor, but there must be some SQL query to do it the proper way. Smile It may be a whole procedure, whatever.

Thanks.

delete from airports
where origCity>destCity
and exists (select * from airports a2
where a2.origCity=airports.destCity
and a2.destCity=airports.origCity)
|||You also need to adjust your insert code so that "duplicate" records do not get inserted in the first place.|||

I cannot believe it was so simple! x_x

If I may ask - why origCity>destCity and not something else. I don't get it.

Kent Waldrop 2007 Mar wrote:

You also need to adjust your insert code so that "duplicate" records do not get inserted in the first place.

This was some data imported from a CSV I got from a client, so it's not a case. Wink

|||

pawel.krakowiak wrote:

If I may ask - why origCity>destCity and not something else. I don't get it.

You need to ensure that the operation only deletes one record of each pair (without that condition it would have deleted both records of the pair). The condition arbitrarily selects the records where origCity sorts after destCity to remove. Mark could equally well have used origCity < destCity and removed the other record of the pair.

|||Thought so... I never used such construct (I mean, it depends on the context) before, though. Good to know.

No comments:

Post a Comment