Wednesday, March 28, 2012

Finding Duplicates

I have a company table and I would like to write a query that will return to
me any duplicate companies. However, it is a little more complicated then
just matching on exact company names. I would like it to give me duplicates
where x number of letters at the beginning of the company name match AND x
number of letters of the address match AND x number of letters of the city
match. I will be doing this in batches based on the first letter of the
company name. So for example I will first process all companies that start
with the letter "A".

So for all "A" companies I want to find companies where the first 5 letters
in the company name match and the first 5 characters of the address field
match and the first 5 characters of the city match. THANKS!!!Can you post simplified DDLs, some sample data & expected results? For
detail refer to : www.aspfaq.com/5006

--
- Anith
( Please reply to newsgroups only )|||"Erich" <erich93063@.hotmail.com> wrote in message news:<102gbomj7gc84f7@.corp.supernews.com>...
> I have a company table and I would like to write a query that will return to
> me any duplicate companies. However, it is a little more complicated then
> just matching on exact company names. I would like it to give me duplicates
> where x number of letters at the beginning of the company name match AND x
> number of letters of the address match AND x number of letters of the city
> match. I will be doing this in batches based on the first letter of the
> company name. So for example I will first process all companies that start
> with the letter "A".
> So for all "A" companies I want to find companies where the first 5 letters
> in the company name match and the first 5 characters of the address field
> match and the first 5 characters of the city match. THANKS!!!

Something like this may work:

select t.*
from dbo.MyTable t
join
(
select
left(CompanyName, 5) as 'CompName',
left(Address, 5) as 'Addr',
left(City, 5) as 'City',
count(*) as 'Dupes'
from
dbo.MyTable
where
left(CompanyName, 1) = 'A'
group by
left(CompanyName, 5),
left(Address, 5),
left(City, 5)
having
count(*) > 1
) dt
on dt.CompName = left(t.CompanyName, 5)
and dt.Addr = left(t.Address, 5)
and dt.City = left(t.City, 5)

If this doesn't work as you expect, then please consider posting your
table DDL, as well as some sample data.

Simon|||This may work as well, assuming a parameter is passed into the stored proc
or function for the first letter:

SELECT C1.CompanyID, C2.CompanyID,
C1.CompanyName, C2.CompanyName,
C1.Address, C2.Address, C1.City, C2.City
FROM Company C1 JOIN Company C2 ON
LEFT(C1.CompanyName, 5) = LEFT(C2.CompanyName, 5) AND
LEFT(C1.Address, 5) = LEFT(C2.Address, 5) AND
LEFT(C1.City, 5) = LEFT(C2.City, 5) AND
C1.CompanyID != C2.CompanyID
WHERE LEFT(C1.CompanyName, 1) = @.FirstLetter

You could also use a variable parameter instead of hard-coding "5" to allow
for more specific or more general matches.
ie, ... LEFT(C1.Company, @.MatchLength) = LEFT(C2.CompanyName, @.MatchLength)

"Erich" <erich93063@.hotmail.com> wrote in message
news:102gbomj7gc84f7@.corp.supernews.com...
> I have a company table and I would like to write a query that will return
to
> me any duplicate companies. However, it is a little more complicated then
> just matching on exact company names. I would like it to give me
duplicates
> where x number of letters at the beginning of the company name match AND x
> number of letters of the address match AND x number of letters of the city
> match. I will be doing this in batches based on the first letter of the
> company name. So for example I will first process all companies that start
> with the letter "A".
> So for all "A" companies I want to find companies where the first 5
letters
> in the company name match and the first 5 characters of the address field
> match and the first 5 characters of the city match. THANKS!!!

No comments:

Post a Comment