Thursday, March 29, 2012

Finding last "whitespace" character in a string?

I'm trying to figure out how to find the last whitespace character in a
varchar string. To complicate things, it's not just spaces that I'm
looking for, but certain ascii characters (otherwise, obviously, just
use LEN). My initial thought was to REVERSE it, find the location
(using CHARINDEX) looking for each of those characters (so, multiple
queries), then subtract that from the LEN of the string.

The problem I'm running into is that there are about a dozen different
characters we're looking for. Any suggestions? My thought was to
(this sounds silly, so there's gotta be a better way) dump the results
from each CHARINDEX into a table, then find the MAX of the table and
use that. But, like I said, it sounds silly. I don't think I can do a
[^0-9A-Z] either, since there are non-Alphanumeric characters we're
looking for.

Many thanks."M Bourgon" wrote:

> I'm trying to figure out how to find the last whitespace character in a
> varchar string. To complicate things, it's not just spaces that I'm
> looking for, but certain ascii characters (otherwise, obviously, just
> use LEN). My initial thought was to REVERSE it, find the location
> (using CHARINDEX) looking for each of those characters (so, multiple
> queries), then subtract that from the LEN of the string.
> The problem I'm running into is that there are about a dozen different
> characters we're looking for. Any suggestions? My thought was to
> (this sounds silly, so there's gotta be a better way) dump the results
> from each CHARINDEX into a table, then find the MAX of the table and
> use that. But, like I said, it sounds silly. I don't think I can do a
> [^0-9A-Z] either, since there are non-Alphanumeric characters we're
> looking for.
> Many thanks.

Why not use LIKE but build the pattern in a variable using CHAR()?

declare @.t table (c varchar(50))

insert @.t values ('not this one')
insert @.t values ('or this one')

insert @.t values ('
not even this one')

insert @.t values ('only this one
')

declare @.crit varchar(50)
set @.crit = '%[' + CHAR(13) + CHAR(10) + ']'

select * from @.t where c like @.crit

Craig|||How about the old table of numbers trick?

first you create a table of numbers big enough to handle the length of
string you are dealing with, I'll do 8000 in this case but it could be more
(You only need to do this the once)

SELECT TOP 8000 Number = IDENTITY(int, 1, 1)
INTO Numbers
FROM master..sysobjects, master..sysobjects, master..sysobjects

Now that you have your table of numbers, you can use it to index into your
string and look for the whitespace:

SELECT Top 1 Number
FROM Numbers
WHERE Number<=Len(@.Str) AND Substring(@.Str, Number, 1) IN (char(32),
char(13), char(8))
ORDER BY Number DESC

'Get the first number from the table where the number is less or equal to
the length of the string and the character at the numbers position is in a
given set of whitespace characters, starting at the highest number'

For optimum performance create a clustered index on the table of numbers.

You may also be able to use REVERSE and PATINDEX by encoding a string of all
the whitespace characters '%['+char(8)+char(10)+char(32)+']%', although I've
never tried using '[]' with patindex and its not nearly as interesting :)

Mr Tea

"M Bourgon" <bourgon@.gmail.com> wrote in message
news:1111101450.978587.314110@.l41g2000cwc.googlegr oups.com...
> I'm trying to figure out how to find the last whitespace character in a
> varchar string. To complicate things, it's not just spaces that I'm
> looking for, but certain ascii characters (otherwise, obviously, just
> use LEN). My initial thought was to REVERSE it, find the location
> (using CHARINDEX) looking for each of those characters (so, multiple
> queries), then subtract that from the LEN of the string.
> The problem I'm running into is that there are about a dozen different
> characters we're looking for. Any suggestions? My thought was to
> (this sounds silly, so there's gotta be a better way) dump the results
> from each CHARINDEX into a table, then find the MAX of the table and
> use that. But, like I said, it sounds silly. I don't think I can do a
> [^0-9A-Z] either, since there are non-Alphanumeric characters we're
> looking for.
> Many thanks.|||Oops,
dont forget to alias the tables in the cross join, if you dont have access
to sysobjects you can use any table with a decent amount of records.

master..sysobjects a, master..sysobjects b, master..sysobjects c

Mr Tea

"Lee Tudor" <mr_tea@.ntlworld.com> wrote in message
news:l2w_d.814$MO6.640@.newsfe2-gui.ntli.net...
> How about the old table of numbers trick?
> first you create a table of numbers big enough to handle the length of
> string you are dealing with, I'll do 8000 in this case but it could be
> more (You only need to do this the once)
> SELECT TOP 8000 Number = IDENTITY(int, 1, 1)
> INTO Numbers
> FROM master..sysobjects, master..sysobjects, master..sysobjects
> Now that you have your table of numbers, you can use it to index into your
> string and look for the whitespace:
> SELECT Top 1 Number
> FROM Numbers
> WHERE Number<=Len(@.Str) AND Substring(@.Str, Number, 1) IN (char(32),
> char(13), char(8))
> ORDER BY Number DESC
> 'Get the first number from the table where the number is less or equal to
> the length of the string and the character at the numbers position is in a
> given set of whitespace characters, starting at the highest number'
> For optimum performance create a clustered index on the table of numbers.
> You may also be able to use REVERSE and PATINDEX by encoding a string of
> all the whitespace characters '%['+char(8)+char(10)+char(32)+']%',
> although I've never tried using '[]' with patindex and its not nearly as
> interesting :)
> Mr Tea
> "M Bourgon" <bourgon@.gmail.com> wrote in message
> news:1111101450.978587.314110@.l41g2000cwc.googlegr oups.com...
>> I'm trying to figure out how to find the last whitespace character in a
>> varchar string. To complicate things, it's not just spaces that I'm
>> looking for, but certain ascii characters (otherwise, obviously, just
>> use LEN). My initial thought was to REVERSE it, find the location
>> (using CHARINDEX) looking for each of those characters (so, multiple
>> queries), then subtract that from the LEN of the string.
>>
>> The problem I'm running into is that there are about a dozen different
>> characters we're looking for. Any suggestions? My thought was to
>> (this sounds silly, so there's gotta be a better way) dump the results
>> from each CHARINDEX into a table, then find the MAX of the table and
>> use that. But, like I said, it sounds silly. I don't think I can do a
>> [^0-9A-Z] either, since there are non-Alphanumeric characters we're
>> looking for.
>>
>> Many thanks.
>>|||On 17 Mar 2005 15:17:31 -0800, M Bourgon wrote:

>I'm trying to figure out how to find the last whitespace character in a
>varchar string. To complicate things, it's not just spaces that I'm
>looking for, but certain ascii characters (otherwise, obviously, just
>use LEN). My initial thought was to REVERSE it, find the location
>(using CHARINDEX) looking for each of those characters (so, multiple
>queries), then subtract that from the LEN of the string.
>The problem I'm running into is that there are about a dozen different
>characters we're looking for. Any suggestions?

Hi M,

Yep - use PATINDEX instead of CHARINDEX. In the example below, I search
for space and char(8) (tab) only, but it's easy to add other whitespace
characters. To test it, run the code below, uncomment the commented line
and run it again - you'll see that first the space, then the tabl is
found.

declare @.a varchar(100)
set @.a = 'This is a test'
-- + char(8) + 'tabbed'
select @.a
declare @.LastWhite int
set @.LastWhite = len(@.a) - patindex('%[ ' + char(8) + ']%', reverse(@.a))
+ 1
select @.LastWhite
select substring(@.a, @.LastWhite, 99)

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||D'oh! That's exactly what I was looking for. Many thanks, everyone.|||Okay, I thought that was the answer, but not quite. My problem is that
I'm looking to send back the "this is a test", even if it's technically
'this is a test ' + char(8) + char(8). Patindex will send the first
instance, which in this case would mean there's still a char(8) & a
space left. I saw some code that does something like this, where it
uses a while loop to step through the table, but I'd rather avoid that
if possible.

I'll give Lee's "table of Numbers" trick a shot next. Thanks, all.|||On 18 Mar 2005 07:18:21 -0800, M Bourgon wrote:

>Okay, I thought that was the answer, but not quite. My problem is that
>I'm looking to send back the "this is a test", even if it's technically
>'this is a test ' + char(8) + char(8). Patindex will send the first
>instance, which in this case would mean there's still a char(8) & a
>space left. I saw some code that does something like this, where it
>uses a while loop to step through the table, but I'd rather avoid that
>if possible.
>I'll give Lee's "table of Numbers" trick a shot next. Thanks, all.

Hi M,

So I guess that you're not looking for the last whitespace, but for the
last non-whitespace? Very easy to do - just add one caret (^) to my
code:

declare @.a varchar(100)
set @.a = 'This is a test ' + char(8) + char(8)
select @.a
declare @.LastWhite int
set @.LastWhite = len(@.a) - patindex('%[^ ' + char(8) + ']%',
reverse(@.a)) + 1
select @.LastWhite
select substring(@.a, @.LastWhite, 99)

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Aha! Boy, am I dense.

One correction, for future generations reading this: change the len(@.a)
to datalength(@.a), in case all that's at the end are spaces (since it's
a varchar, it'll automatically drop spaces at the end). You could
probably change it to a char(100) as well, but this way you're
(hopefully) not using up as much memory.

Thanks again, Hugo.|||Hugo Kornelis (hugo@.pe_NO_rFact.in_SPAM_fo) writes:
> Yep - use PATINDEX instead of CHARINDEX. In the example below, I search
> for space and char(8) (tab) only,

The nit-picking department like to point out that char(8) is backspace. Tab
is char(9).

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||On Sat, 19 Mar 2005 18:03:07 +0000 (UTC), Erland Sommarskog wrote:

>Hugo Kornelis (hugo@.pe_NO_rFact.in_SPAM_fo) writes:
>> Yep - use PATINDEX instead of CHARINDEX. In the example below, I search
>> for space and char(8) (tab) only,
>The nit-picking department like to point out that char(8) is backspace. Tab
>is char(9).

Hi Erland,

As someone who occasionaly uses ^H in messages to sufficiently geeky
persons, I really should have known that...

Thanks for picking my nit!

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment