Wednesday, March 28, 2012

Finding displaying and deleteing dulpicate records

HI all,
I am trying to find duplicate records ina database that I have taken over. I
am using group bys and count() to find the sups.
I also realise that we could have a problem with misspellings and hterefore
not have a duplicate as in Joe Soape, Jo Soap, Joe Sope Joe Soap. These
couldpossible be sups of the same. I would terefore assume to allow for user
to decide what is dups and what to keep.
Are there any other more acceptable ways and means of doing this.
THis a a sample of what I use
SELECT Name, address1, address2,city, COUNT(*) as numbofdups FROM reinsurer
GROUP BY Name, address1, address2, city HAVING COUNT(*) > 1
Thanks
RobertIf you are using SQL 2005 developer / enterprise edition you could use the
Fuzzy Grouping / Term Extraction transformations in Integration Services and
work it out through that.
Aside from that you need to role your own in T-SQL, there are functions like
SOUNDEX that help to an extent but aren't that brilliant.
Tony.
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials
"Robert Bravery" <me@.u.com> wrote in message
news:u0uuiXbbGHA.3352@.TK2MSFTNGP03.phx.gbl...
> HI all,
> I am trying to find duplicate records ina database that I have taken over.
> I
> am using group bys and count() to find the sups.
> I also realise that we could have a problem with misspellings and
> hterefore
> not have a duplicate as in Joe Soape, Jo Soap, Joe Sope Joe Soap. These
> couldpossible be sups of the same. I would terefore assume to allow for
> user
> to decide what is dups and what to keep.
> Are there any other more acceptable ways and means of doing this.
> THis a a sample of what I use
> SELECT Name, address1, address2,city, COUNT(*) as numbofdups FROM
> reinsurer
> GROUP BY Name, address1, address2, city HAVING COUNT(*) > 1
> Thanks
> Robert
>
>

No comments:

Post a Comment