Wednesday, March 28, 2012

Finding duplicate entries in a "smart" way - by comparing first two words

What is the best way to compare two entries in a single table where
the two fields are "almost" the same?

For example, I would like to write a query that would compare the
first two words in a "company" field. If they are the same, I would
like to output them.

For example, "20th Century" and "20th Century Fox" in the company
field would be the same.

How do I do this? Do I need to use a cursor? Is it as simple as using
"Like?"Here's an example:

CREATE TABLE Sometable (company VARCHAR(20) PRIMARY KEY)

INSERT INTO Sometable VALUES ('20th Century')
INSERT INTO Sometable VALUES ('20th Century Fox')
INSERT INTO Sometable VALUES ('Something Else')

SELECT company
FROM Sometable AS S
JOIN
(SELECT first_two_words
FROM
(SELECT company,
LEFT(company,
CHARINDEX(' ',company+' ',CHARINDEX(' ',company)+1))
AS first_two_words
FROM Sometable) AS X
GROUP BY first_two_words
HAVING COUNT(*)>1) AS T
ON T.first_two_words =
LEFT(company,CHARINDEX(' ',company+' ',CHARINDEX(' ',company)+1))

--
David Portas
----
Please reply only to the newsgroup
--|||>> a query that would compare the first two words in a "company"
field [sic]. If they are the same, I would like to output them. <<

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications.

CREATE TABLE Foobar
(foo_key INTEGER NOT NULL PRIMARY KEY,
company CHAR(35) NOT NULL,
...);

SELECT F1.foo_key, F2.foo_key
FROM Foobar AS F1, Foobar AS F2
WHERE CHARINDEX(F1.company, F2.company) = 1
AND F1.foo_key < F2.foo_key;

No comments:

Post a Comment