Post by Chang Chung..
Post by Ken Borowiak[:cntrl:] - Control characters, equivalent to the user-defined character
class [\x00-\x1F\x7F]
hi, ken,
This may be platform or sas release dependent. on my pc sas 9.1.3 sp4 on
windows vista, the pearl rx POSIX character class, [:cntrl:], does not seem
to match '7F'x.
Chang,
I am seeing something similar under SAS 9.1.3 SP4 on Windows XP, but not in
V9.2. Possibly this 'unintended feature' was fixed???
data _null_ ;
do x='7F'x, '8F'x, '90'x ;
put x= x= hex. ;
if prxmatch( '/[[:cntrl:]]/', x ) then put ' [:CNTRL:] matches X' ;
else put ' [:CNTRL:] does not match X ' ;
if prxmatch( '/[[:ascii:]]/', x ) then put ' [:ASCII:] matches X ' ;
else put ' [:ASCII:] does not match X ' ;
if prxmatch( '/[\x00-\x1F\x7F]/', x ) then put ' [\x00-\x1F\x7F]
matches ' ;
else put '[\x00-\x1F\x7F] does not match X ' ;
if prxmatch( '/[[:graph:]]/', x ) then put ' [[:graph:]] matches ' ;
else put '[[:graph:]] does not match X ' ;
end ;
put / ;
stop ;
run ;
Under SAS 9.1.3 SP4 Windows XP
x= x=7F
[:CNTRL:] does not match X
[:ASCII:] matches X
[\x00-\x1F\x7F] matches
[[:graph:]] does not match X
x=Â x=8F
[:CNTRL:] does not match X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X
x=Â x=90
[:CNTRL:] does not match X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X
Under SAS 9.2 Windows XP
x= x=7F
[:CNTRL:] matches X
[:ASCII:] matches X
[\x00-\x1F\x7F] matches
[[:graph:]] does not match X
x=Â x=8F
[:CNTRL:] matches X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X
x=Â x=90
[:CNTRL:] matches X
[:ASCII:] does not match X
[\x00-\x1F\x7F] does not match X
[[:graph:]] does not match X
From the 9.2 run, it does not appear that [:cntrl:] is not equivalent [\x00-
\x1F\x7F].
I concur with you, whether using SAS function modifiers or predefined regex
character classes, you should know what you are getting into if you choose
to them.
Happy Holidays!
KBueno
Post by Chang ChungOn the other hand, sas compress() function's optional third argument, "c"
modifier, compresses out several more characters.
Maybe the POSIX character class expects only the traditional 7bit ascii
characters and no extended ones.
It may be safer to just spell out exactly which characters to remove,
instead of relying on the modifier or the character class.
cheers,
chang
data _null_;
do i = 0 to 255;
char = byte(i);
if i=32 then continue; /* '20'x is the space. not a cntrl char */
c1 = lengthn(compress(char, ,"c"))=0;
c2 = prxmatch("/[[:cntrl:]]/", char)^=0;
if c1+c2 then put char= :hex2. +1 (c1-c2) (= :1.);
end;
run;
/* on log
char=00 c1=1 c2=1
char=01 c1=1 c2=1
char=02 c1=1 c2=1
char=03 c1=1 c2=1
char=04 c1=1 c2=1
char=05 c1=1 c2=1
char=06 c1=1 c2=1
char=07 c1=1 c2=1
char=08 c1=1 c2=1
char=09 c1=1 c2=1
char=0A c1=1 c2=1
char=0B c1=1 c2=1
char=0C c1=1 c2=1
char=0D c1=1 c2=1
char=0E c1=1 c2=1
char=0F c1=1 c2=1
char=10 c1=1 c2=1
char=11 c1=1 c2=1
char=12 c1=1 c2=1
char=13 c1=1 c2=1
char=14 c1=1 c2=1
char=15 c1=1 c2=1
char=16 c1=1 c2=1
char=17 c1=1 c2=1
char=18 c1=1 c2=1
char=19 c1=1 c2=1
char=1A c1=1 c2=1
char=1B c1=1 c2=1
char=1C c1=1 c2=1
char=1D c1=1 c2=1
char=1E c1=1 c2=1
char=1F c1=1 c2=1
char=7F c1=1 c2=0
char=81 c1=1 c2=0
char=8D c1=1 c2=0
char=8F c1=1 c2=0
char=90 c1=1 c2=0
char=9D c1=1 c2=0
*/