Discussion:
Regular expression for compress() c modifier?
(too old to reply)
Ben Powell
2008-12-18 14:33:48 UTC
Permalink
Calling all regular expressionists...

I've come across the compress() modifiers that enable you to filter out odd
characters. The variety I had printed as rectangles but were not tabs. Using
the compress() modifier of c cleans them out. What would the prx method of
this be, keeping all alphanumeric characters and punctuation in the string?

All comments much appreciated (I still don't know what the odd character was)

Rgds
./ ADD NAME=Data _null_,
2008-12-18 15:02:40 UTC
Permalink
Are you not able to print the value using $HEX format.
Post by Ben Powell
Calling all regular expressionists...
I've come across the compress() modifiers that enable you to filter out odd
characters. The variety I had printed as rectangles but were not tabs. Using
the compress() modifier of c cleans them out. What would the prx method of
this be, keeping all alphanumeric characters and punctuation in the string?
All comments much appreciated (I still don't know what the odd character was)
Rgds
Ben Powell
2008-12-18 15:31:47 UTC
Permalink
That did help to identify them but a general rule would be useful as I had
to plug away to get these:

if index(title,"0A"x)>0 or index(title,"0B"x)>0 or index(title,"7F"x)>0 or
index(title,"81"x)>0;

Thanks
Ben Powell
2008-12-18 15:51:18 UTC
Permalink
Ben
Why are you trying to use regular expressions when you could use some of
the enhanced features of Compress() in 9.1
Curiosity - if that functionality was added in 9.1 as you say, its pretty
good. I was just wondering what the prx equivalent was.

Rgds
Nat Wooding
2008-12-18 15:47:27 UTC
Permalink
Ben

Why are you trying to use regular expressions when you could use some of
the enhanced features of Compress() in 9.1

Are there special characters that you want to include?

If not, you can tell Compress to keep letters of the alphabet and/or
numbers and remove the rest.

Data test;
string = '123*?>abcdfghij';
newstring = compress( string , , 'nk' );

* n = include in the list of characters letters, numbers, and the
underscore;
* k = keep the items in the list ;
put string= / newstring= ;
run;

This gets rid of the symbols that I included in string. I was too lazy to
include hex characters. I see in the documentation that you can include hex
chararacters as a group.

Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977



Ben Powell
<***@CLA.C
O.UK> To
Sent by: "SAS(r) SAS-***@LISTSERV.UGA.EDU
Discussion" cc
<SAS-***@LISTSERV.U
GA.EDU> Subject
Re: Regular expression for
compress() c modifier?
12/18/2008 10:31
AM


Please respond to
Ben Powell
<***@CLA.C
O.UK>






That did help to identify them but a general rule would be useful as I had
to plug away to get these:

if index(title,"0A"x)>0 or index(title,"0B"x)>0 or index(title,"7F"x)>0 or
index(title,"81"x)>0;

Thanks


CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.
Ben Powell
2008-12-18 16:28:07 UTC
Permalink
On Thu, 18 Dec 2008 11:01:00 -0500, Nat Wooding <***@DOM.COM>
wrote:

LOL we're in the same position then - I'll stick with COMPRESS

Regard

<snip>
There may be a regular expression that will do this but I have not yet
learned to use such expressions and have not found a suitable medication
that will keep my head from hurting when I look at them.
Nat
Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977
Nat Wooding
2008-12-18 16:01:00 UTC
Permalink
Ben

The abbreviated documentation in TS486 reads

COMPRESS (str,<rem>,<mod>) removes blanks OR chars specified in rem from
str;
added in V9, optional parameter mod modifies 2nd parameter

If you go to the 9.1 online docs for the compress function, there is a
table of the values that can appear in "Mod".

I do not have to do this type of activity very often so I have to play a
bit to find the combination that I need. Note that you can use REM to
specify specific items to remove or, as in the example that I sent, SAS
offers lists such as the numbers 0 through 9 that may be specified by using
a single letter.

There may be a regular expression that will do this but I have not yet
learned to use such expressions and have not found a suitable medication
that will keep my head from hurting when I look at them.

Nat

Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977



Ben Powell
<***@CLA.C
O.UK> To
Sent by: "SAS(r) SAS-***@LISTSERV.UGA.EDU
Discussion" cc
<SAS-***@LISTSERV.U
GA.EDU> Subject
Re: Regular expression for
compress() c modifier?
12/18/2008 10:51
AM


Please respond to
Ben Powell
<***@CLA.C
O.UK>
Ben
Why are you trying to use regular expressions when you could use some of
the enhanced features of Compress() in 9.1
Curiosity - if that functionality was added in 9.1 as you say, its pretty
good. I was just wondering what the prx equivalent was.

Rgds


CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.
Kevin Viel
2008-12-18 21:17:43 UTC
Permalink
Post by Ben Powell
Calling all regular expressionists...
I've come across the compress() modifiers that enable you to filter out odd
characters. The variety I had printed as rectangles but were not tabs.
Using the compress() modifier of c cleans them out. What would the prx
method of this be, keeping all alphanumeric characters and punctuation in
the string?
All comments much appreciated (I still don't know what the odd character
was)
Hmm, were is the Evil Petting Zoo (keeper?) when you need him...or even
David :)

Here is an opener:

1996 data _null_ ;
1997 length x y z $ 5 ;
1998 x = "12" || "0A"x || "45" ;
1999 put x= ;
2000 y = x ;
2001 x = prxchange( "s/\x0A//" , -1 , x ) ;
2002 put x= / x= hex. / y= hex. ;
2003 z = prxchange( "s/\W//" , -1 , y ) ;
2004 put z= / y= hex. / z= hex. ;
2005 run ;

x=12
45
x=1245
x=3132343520
y=31320A3435
z=1245
y=31320A3435
z=3132343520
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

I am not sure of your list, but you have a variety of adjustments, too.


HTH,

Kevin
Kevin Viel
2008-12-18 21:48:47 UTC
Permalink
Post by Kevin Viel
Hmm, were is the Evil Petting Zoo (keeper?) when you need him...
Holy Cow! I meant where.... Cannot even put that down to bad typing :(
Ken Borowiak
2008-12-18 21:50:07 UTC
Permalink
Post by Kevin Viel
Post by Ben Powell
Calling all regular expressionists...
I've come across the compress() modifiers that enable you to filter out
odd
Post by Kevin Viel
Post by Ben Powell
characters. The variety I had printed as rectangles but were not tabs.
Using the compress() modifier of c cleans them out. What would the prx
method of this be, keeping all alphanumeric characters and punctuation in
the string?
All comments much appreciated (I still don't know what the odd character
was)
Hmm, were is the Evil Petting Zoo (keeper?) when you need him...or even
David :)
1996 data _null_ ;
1997 length x y z $ 5 ;
1998 x = "12" || "0A"x || "45" ;
1999 put x= ;
2000 y = x ;
2001 x = prxchange( "s/\x0A//" , -1 , x ) ;
2002 put x= / x= hex. / y= hex. ;
2003 z = prxchange( "s/\W//" , -1 , y ) ;
2004 put z= / y= hex. / z= hex. ;
2005 run ;
x=12
45
x=1245
x=3132343520
y=31320A3435
z=1245
y=31320A3435
z=3132343520
real time 0.03 seconds
cpu time 0.01 seconds
I am not sure of your list, but you have a variety of adjustments, too.
HTH,
Kevin
Kevin,

I just needed a nudge.

I think using some of the POSIX bracketed expressions can be used in
conjunction with a substitution/compression regex to do the job.
e.g.

data _null_ ;
length x y $ 5 ;
x = "12" || "0A"x || "45" ;
put x= ;
y = x ;
x = prxchange( "s/[[:cntrl:]]//" , -1 , x ) ;
put x= / x= hex. / y= hex. ;

run ;

SAS log:
x=12RECTANGLE45
x=1245
x=3132343520
y=31320A3435

[:cntrl:] - Control characters, equivalent to the user-defined character
class [\x00-\x1F\x7F]

Happy Holidays!
Ken Borowiak
Chang Chung
2008-12-19 17:26:55 UTC
Permalink
On Thu, 18 Dec 2008 16:50:07 -0500, Ken Borowiak <***@AOL.COM>
wrote:
..
Post by Ken Borowiak
[:cntrl:] - Control characters, equivalent to the user-defined character
class [\x00-\x1F\x7F]
hi, ken,

This may be platform or sas release dependent. on my pc sas 9.1.3 sp4 on
windows vista, the pearl rx POSIX character class, [:cntrl:], does not seem
to match '7F'x.
On the other hand, sas compress() function's optional third argument, "c"
modifier, compresses out several more characters.
Maybe the POSIX character class expects only the traditional 7bit ascii
characters and no extended ones.
It may be safer to just spell out exactly which characters to remove,
instead of relying on the modifier or the character class.
cheers,
chang

data _null_;
do i = 0 to 255;
char = byte(i);
if i=32 then continue; /* '20'x is the space. not a cntrl char */
c1 = lengthn(compress(char, ,"c"))=0;
c2 = prxmatch("/[[:cntrl:]]/", char)^=0;
if c1+c2 then put char= :hex2. +1 (c1-c2) (= :1.);
end;
run;
/* on log
char=00 c1=1 c2=1
char=01 c1=1 c2=1
char=02 c1=1 c2=1
char=03 c1=1 c2=1
char=04 c1=1 c2=1
char=05 c1=1 c2=1
char=06 c1=1 c2=1
char=07 c1=1 c2=1
char=08 c1=1 c2=1
char=09 c1=1 c2=1
char=0A c1=1 c2=1
char=0B c1=1 c2=1
char=0C c1=1 c2=1
char=0D c1=1 c2=1
char=0E c1=1 c2=1
char=0F c1=1 c2=1
char=10 c1=1 c2=1
char=11 c1=1 c2=1
char=12 c1=1 c2=1
char=13 c1=1 c2=1
char=14 c1=1 c2=1
char=15 c1=1 c2=1
char=16 c1=1 c2=1
char=17 c1=1 c2=1
char=18 c1=1 c2=1
char=19 c1=1 c2=1
char=1A c1=1 c2=1
char=1B c1=1 c2=1
char=1C c1=1 c2=1
char=1D c1=1 c2=1
char=1E c1=1 c2=1
char=1F c1=1 c2=1
char=7F c1=1 c2=0
char=81 c1=1 c2=0
char=8D c1=1 c2=0
char=8F c1=1 c2=0
char=90 c1=1 c2=0
char=9D c1=1 c2=0
*/
Ken Borowiak
2008-12-19 19:07:26 UTC
Permalink
Post by Chang Chung
..
Post by Ken Borowiak
[:cntrl:] - Control characters, equivalent to the user-defined character
class [\x00-\x1F\x7F]
hi, ken,
This may be platform or sas release dependent. on my pc sas 9.1.3 sp4 on
windows vista, the pearl rx POSIX character class, [:cntrl:], does not seem
to match '7F'x.
Chang,

I am seeing something similar under SAS 9.1.3 SP4 on Windows XP, but not in
V9.2. Possibly this 'unintended feature' was fixed???

data _null_ ;
do x='7F'x, '8F'x, '90'x ;
put x= x= hex. ;

if prxmatch( '/[[:cntrl:]]/', x ) then put ' [:CNTRL:] matches X' ;
else put ' [:CNTRL:] does not match X ' ;

if prxmatch( '/[[:ascii:]]/', x ) then put ' [:ASCII:] matches X ' ;
else put ' [:ASCII:] does not match X ' ;

if prxmatch( '/[\x00-\x1F\x7F]/', x ) then put ' [\x00-\x1F\x7F]
matches ' ;
else put '[\x00-\x1F\x7F] does not match X ' ;

if prxmatch( '/[[:graph:]]/', x ) then put ' [[:graph:]] matches ' ;
else put '[[:graph:]] does not match X ' ;
end ;
put / ;
stop ;
run ;


Under SAS 9.1.3 SP4 Windows XP
x= x=7F
[:CNTRL:] does not match X
[:ASCII:] matches X
[\x00-\x1F\x7F] matches
[[:graph:]] does not match X
x= x=8F
[:CNTRL:] does not match X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X
x= x=90
[:CNTRL:] does not match X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X

Under SAS 9.2 Windows XP
x= x=7F
[:CNTRL:] matches X
[:ASCII:] matches X
[\x00-\x1F\x7F] matches
[[:graph:]] does not match X
x= x=8F
[:CNTRL:] matches X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X
x= x=90
[:CNTRL:] matches X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X

From the 9.2 run, it does not appear that [:cntrl:] is not equivalent [\x00-
\x1F\x7F].

I concur with you, whether using SAS function modifiers or predefined regex
character classes, you should know what you are getting into if you choose
to them.

Happy Holidays!
KBueno
Post by Chang Chung
On the other hand, sas compress() function's optional third argument, "c"
modifier, compresses out several more characters.
Maybe the POSIX character class expects only the traditional 7bit ascii
characters and no extended ones.
It may be safer to just spell out exactly which characters to remove,
instead of relying on the modifier or the character class.
cheers,
chang
data _null_;
do i = 0 to 255;
char = byte(i);
if i=32 then continue; /* '20'x is the space. not a cntrl char */
c1 = lengthn(compress(char, ,"c"))=0;
c2 = prxmatch("/[[:cntrl:]]/", char)^=0;
if c1+c2 then put char= :hex2. +1 (c1-c2) (= :1.);
end;
run;
/* on log
char=00 c1=1 c2=1
char=01 c1=1 c2=1
char=02 c1=1 c2=1
char=03 c1=1 c2=1
char=04 c1=1 c2=1
char=05 c1=1 c2=1
char=06 c1=1 c2=1
char=07 c1=1 c2=1
char=08 c1=1 c2=1
char=09 c1=1 c2=1
char=0A c1=1 c2=1
char=0B c1=1 c2=1
char=0C c1=1 c2=1
char=0D c1=1 c2=1
char=0E c1=1 c2=1
char=0F c1=1 c2=1
char=10 c1=1 c2=1
char=11 c1=1 c2=1
char=12 c1=1 c2=1
char=13 c1=1 c2=1
char=14 c1=1 c2=1
char=15 c1=1 c2=1
char=16 c1=1 c2=1
char=17 c1=1 c2=1
char=18 c1=1 c2=1
char=19 c1=1 c2=1
char=1A c1=1 c2=1
char=1B c1=1 c2=1
char=1C c1=1 c2=1
char=1D c1=1 c2=1
char=1E c1=1 c2=1
char=1F c1=1 c2=1
char=7F c1=1 c2=0
char=81 c1=1 c2=0
char=8D c1=1 c2=0
char=8F c1=1 c2=0
char=90 c1=1 c2=0
char=9D c1=1 c2=0
*/
Ben Powell
2008-12-22 10:41:16 UTC
Permalink
Thanks for the posts, this was particularly useful:

data test;
do i = 0 to 255;
char = byte(i);
if i=32 then continue; /* '20'x is the space. not a cntrl char */
c1 = lengthn(compress(char, ,"c"))=0;
c2 = prxmatch("/[[:cntrl:]]/", char)^=0;
if c1+c2 then put char= :hex2. +1 (c1-c2) (= :1.);
output;
end;
run;


Best Festive Wishes

Loading...