Discussion:
capping outliers
(too old to reply)
rashmi
2009-03-19 05:56:08 UTC
Permalink
Raw Message
I am trying to cap the values for some data which have outliers to the
minimum of variable value or 1 times the 80th percentile of the
variable(var80).The file has variables varstd, which is the standard
deviation of the variable (var), the 80th percentile (var80) and the
original variable (var)
The code that I have written is

data expt;
input var@@;
datalines;
57 82 31 65 25 212
42 35 55 50 25 55
54 43 187 567 987
12 0 91 76 65 657
527 879 907 73 53
94 27 1225 25 765
1980
;
run;

proc univariate data=expt plot;
var var;
run;

proc univariate data=expt noprint;
var var;
output out=out std=varstd
pctlpts=80
pctlpre=var;run;

data expt ;
set expt;
if(_n_ eq 1) then set out;
if varstd > 2*var80 then
var_est = min(var,(1*var80));
else var_est = var;
run;

proc univariate data=expt plot;
var var;
run;

But, this does not seem to work.
Could any one help me?
Alex Murphy
2009-03-19 06:13:53 UTC
Permalink
Raw Message
I think you have to merge the data set back to the file.

The expt file contains no varstd field. A merge data step will allow you to
merge the dataset back to expt.

Regards,
Murphy
Post by rashmi
I am trying to cap the values for some data which have outliers to the
minimum of variable value or 1 times the 80th percentile of the
variable(var80).The file has variables varstd, which is the standard
deviation of the variable (var), the 80th percentile (var80) and the
original variable (var)
The code that I have written is
data expt;
datalines;
57 82 31 65 25 212
42 35 55 50 25 55
54 43 187 567 987
12 0 91 76 65 657
527 879 907 73 53
94 27 1225 25 765
1980
;
run;
proc univariate data=expt plot;
var var;
run;
proc univariate data=expt noprint;
var var;
output out=out std=varstd
pctlpts=80
pctlpre=var;run;
data expt ;
set expt;
if(_n_ eq 1) then set out;
if varstd > 2*var80 then
var_est = min(var,(1*var80));
else var_est = var;
run;
proc univariate data=expt plot;
var var;
run;
But, this does not seem to work.
Could any one help me?
--
Regards,
Murphy Choy

Certified Advanced Programmer for SAS V9
Certified Basic Programmer for SAS V9
DataShaping Certified SAS Professional
Arthur Tabachneck
2009-03-19 12:15:42 UTC
Permalink
Raw Message
If you're only trying to cap at the 80th percentile, then wouldn't the
following modification of your last data step do what you want?:

data expt ;
set expt;
if(_n_ eq 1) then set out;
var_est = min(var,var80);
run;

HTH,
Art
-------
Post by rashmi
I am trying to cap the values for some data which have outliers to the
minimum of variable value or 1 times the 80th percentile of the
variable(var80).The file has variables varstd, which is the standard
deviation of the variable (var), the 80th percentile (var80) and the
original variable (var)
The code that I have written is
data expt;
datalines;
57 82 31 65 25 212
42 35 55 50 25 55
54 43 187 567 987
12 0 91 76 65 657
527 879 907 73 53
94 27 1225 25 765
1980
;
run;
proc univariate data=expt plot;
var var;
run;
proc univariate data=expt noprint;
var var;
output out=out std=varstd
pctlpts=80
pctlpre=var;run;
data expt ;
set expt;
if(_n_ eq 1) then set out;
if varstd > 2*var80 then
var_est = min(var,(1*var80));
else var_est = var;
run;
proc univariate data=expt plot;
var var;
run;
But, this does not seem to work.
Could any one help me?
Robin R High
2009-03-19 15:43:30 UTC
Permalink
Raw Message
If you look at the first few records of the last expt file you'll see

Obs var varstd var80 var_est

1 57 453.882 657 57
2 82 453.882 657 82
3 31 453.882 657 31

so the IF statement you enteredis never true

if varstd > 2*var80 then ..... ;


The standard deviation isn't relevant if you just want to change all the
"large" values above the 80th percentile to the 80th percentile, so this
would do it:

if var > var80 then

If you intend to identify the outliers with twice the standard deviation,
then so called Large values could be given a new value less than other
large legitimate values > 80th percentile.

However, as a disclaimer, not knowing the sample size and the reason for
changing so called "outliers" to the 80th percentile, it seems like a
rather odd decision rule to implement.

Robin High
UNMC





rashmi <***@GMAIL.COM>
Sent by: "SAS(r) Discussion" <SAS-***@LISTSERV.UGA.EDU>
03/19/2009 12:59 AM
Please respond to
rashmi <***@GMAIL.COM>


To
SAS-***@LISTSERV.UGA.EDU
cc

Subject
capping outliers






I am trying to cap the values for some data which have outliers to the
minimum of variable value or 1 times the 80th percentile of the
variable(var80).The file has variables varstd, which is the standard
deviation of the variable (var), the 80th percentile (var80) and the
original variable (var)
The code that I have written is

data expt;
input var@@;
datalines;
57 82 31 65 25 212
42 35 55 50 25 55
54 43 187 567 987
12 0 91 76 65 657
527 879 907 73 53
94 27 1225 25 765
1980
;
run;

proc univariate data=expt plot;
var var;
run;

proc univariate data=expt noprint;
var var;
output out=out std=varstd
pctlpts=80
pctlpre=var;run;

data expt ;
set expt;
if(_n_ eq 1) then set out;
if varstd > 2*var80 then
var_est = min(var,(1*var80));
else var_est = var;
run;

proc univariate data=expt plot;
var var;
run;

But, this does not seem to work.
Could any one help me?
b***@gmail.com
2017-07-22 13:22:02 UTC
Permalink
Raw Message
Post by Robin R High
If you look at the first few records of the last expt file you'll see
Obs var varstd var80 var_est
1 57 453.882 657 57
2 82 453.882 657 82
3 31 453.882 657 31
so the IF statement you enteredis never true
if varstd > 2*var80 then ..... ;
The standard deviation isn't relevant if you just want to change all the
"large" values above the 80th percentile to the 80th percentile, so this
if var > var80 then
If you intend to identify the outliers with twice the standard deviation,
then so called Large values could be given a new value less than other
large legitimate values > 80th percentile.
However, as a disclaimer, not knowing the sample size and the reason for
changing so called "outliers" to the 80th percentile, it seems like a
rather odd decision rule to implement.
Robin High
UNMC
03/19/2009 12:59 AM
Please respond to
To
cc
Subject
capping outliers
I am trying to cap the values for some data which have outliers to the
minimum of variable value or 1 times the 80th percentile of the
variable(var80).The file has variables varstd, which is the standard
deviation of the variable (var), the 80th percentile (var80) and the
original variable (var)
The code that I have written is
data expt;
datalines;
57 82 31 65 25 212
42 35 55 50 25 55
54 43 187 567 987
12 0 91 76 65 657
527 879 907 73 53
94 27 1225 25 765
1980
;
run;
proc univariate data=expt plot;
var var;
run;
proc univariate data=expt noprint;
var var;
output out=out std=varstd
pctlpts=80
pctlpre=var;run;
data expt ;
set expt;
if(_n_ eq 1) then set out;
if varstd > 2*var80 then
var_est = min(var,(1*var80));
else var_est = var;
run;
proc univariate data=expt plot;
var var;
run;
But, this does not seem to work.
Could any one help me?
i have a code for capping mean+3std.dev data but it is not working properly,
here is the code :--
data no_outlier;
set test2;
if PURCHASES > 7526.76 then PURCHASES = 7526.76;
else if ONEOFF_PURCHASES > 5657.831 then ONEOFF_PURCHASES = 5657.831;
else if PURCHASES_TRX > 90.57464 then PURCHASES_TRX = 90.57464;
else if PURCHASES_INSTALLMENTS_FREQUENCY > 1.563099 then PURCHASES_INSTALLMENTS_FREQUENCY = 1.563099;
else if PAYMENTS > 10513.91 then PAYMENTS = 10513.91;
else if CASH_ADVANCE_TRX > 24.05144 then CASH_ADVANCE_TRX = 24.05144;
else if BALANCE > 7887.93 then BALANCE = 7887.93;
else if CREDIT_LIMIT > 15499.81 then CREDIT_LIMIT = 15499.81;
else if PRC_FULL_PAYMENT > 1.048116 then PRC_FULL_PAYMENT = 1.048116;
run;

somebody please help as there is 2.6% outliers after running this , previously i have 7.8% outliers , it has improved (: but still 2.6% outliers is present please anyone help me.
Loading...