Thursday, March 29, 2012

Finding illegal characters

I'm trying to find all occurances of bad data in a varchar column. The bad
characters include: /";:<>|[],.'?~`!$%^&*()-+={}\
I need a quick and dirty way to do this.
I tried:
--Doesn't work...
WHERE MyColumn LIKE '%[/";:<>|[],.''?~`!$%^&*()-+={}\]%'
--Works, but too much code.
WHERE MyColumn like '%[|]%' OR MyColumn LIKE '%[[]%' OR ....
Ideas Anyone?It might be better to identify and define the characters you want to allow,
rather than the opposite.
What's nice about this approach is thatyou can indicate A-Z, a-z and 0-9 in
a range. Just add any other allowable characters to the end.
CREATE TABLE #foo
(
bar VARCHAR(32)
)
INSERT #foo(bar) SELECT 'legal';
INSERT #foo(bar) SELECT 'not legal -- ~?@.#@.#';
-- find the legal ones
SELECT * FROM #foo
WHERE PATINDEX('%[^A-Za-z0-9]%', bar) = 0
-- find the bad ones
SELECT * FROM #foo
WHERE PATINDEX('%[^A-Za-z0-9]%', bar) > 0
DROP TABLE #foo
"Billg_sd" <Billgsd@.discussions.microsoft.com> wrote in message
news:85320AA5-EA1F-4C3E-A4BB-CB85ECA6D17D@.microsoft.com...
> I'm trying to find all occurances of bad data in a varchar column. The
> bad
> characters include: /";:<>|[],.'?~`!$%^&*()-+={}\
> I need a quick and dirty way to do this.
> I tried:
> --Doesn't work...
> WHERE MyColumn LIKE '%[/";:<>|[],.''?~`!$%^&*()-+={}\]%'
> --Works, but too much code.
> WHERE MyColumn like '%[|]%' OR MyColumn LIKE '%[[]%' OR ....
> Ideas Anyone?|||Billg_sd (Billgsd@.discussions.microsoft.com) writes:
> I'm trying to find all occurances of bad data in a varchar column. The
> bad characters include: /";:<>|[],.'?~`!$%^&*()-+={}\
> I need a quick and dirty way to do this.
> I tried:
> --Doesn't work...
> WHERE MyColumn LIKE '%[/";:<>|[],.''?~`!$%^&*()-+={}\]%'
> --Works, but too much code.
> WHERE MyColumn like '%[|]%' OR MyColumn LIKE '%[[]%' OR ....
Here's variation that works:
WHERE a LIKE '%[-[/";:<>|,.''?~`!$%^&*()+={}\]%' OR a like '%]%'
There were two problems in the original string:
1) The hyphen must come first, or else it denotes a range.
2) The right bracket must be tested for separately.
As Aaron says, it may be better to test for permitted characters instead,
but that may be devilish too. For instance, his example was:
'%[^A-Za-z0-9]%'
But the result of this range depends on the collation. For instance,
in a Finnish-Swedish collation it would match the word "coperative",
whereas it would not in many other collations.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

No comments:

Post a Comment