Quantcast
Channel: ListenData
Viewing all articles
Browse latest Browse all 425

Extract last 4 values of variable in SAS

$
0
0
This tutorial explains how to extract last n characters or numbers of a variable in SAS. In this tutorial, we will cover several cases in which we pull last 4 character or numeric values from a column. In MS Excel, it is easily possible with RIGHT() function but there is no-inbuilt function to do it in SAS. The RIGHT() function of SAS is used for something else i.e. it right aligns string or character value.

Example 1

Suppose you have a product ID in which last 4 characters refers to a product category so you are asked to pull product category information.
data example;
productID = "AB123ZX253";
run;

data example;
set example;
referenceid = substr(productID,length(productID)-3,4);
run;

The output is shown in the image below -
Get last N Characters
Explanation

The SUBSTR() function returns sub-string from a character variable.
= SUBSTR(character-variable, beginning-position, number-of-characters-to-pull)
The LENGTH() function returns the length of a character variable. In this case, it is 10 characters long. The calculated SUBSTR() function would work like below -
= SUBSTR(productID, 10-3, 4)
= SUBSTR(productID, 7, 4)

Example 2 : Handle missing while extracting

Suppose you have multiple product IDs in which some of them are missing.
data example2;
input productID $10.;
cards;
AB123ZX253

AB123ZX254
AB123ZX255
;
run;
When you apply SUBSTR() function in case of missing cases, it returns a note in log 'Invalid second argument'. See the image below

SUBSTR : Missing Cases
To workaround this problem, we can use SUBSTRN() which handles missing cases while extracting.
data example2;
set example2;
referenceid = substrn(productID,length(productID)-3,4);
run;
Example 3 : Get last N digits from a numeric variable

Suppose product ID variable is numeric.
data example3;
input productID;
cards;
12345253

12354234
12354235
;
run;
The SUBSTR() / SUBSTRN() function works only for character variables. In order to make it run for numeric variables, we first need to convert numeric variables to character variables with PUT() function.
data example3;
set example3;
x = put(productID, 10.);
referenceid = input(substrn(x,length(x)-3,4),8.);
drop x;
proc print noobs;
run;
Since SUBSTRN() returns character variable, we need to convert it to numeric with INPUT() function.

Another Method -
data example3;
set example3;
referenceid = input(substrn(productID,INT(LOG10(productID))+1-3,4),8.);
proc print noobs;
run;
How it works -

Step 1 : LOG10(1021) returns 3.009
Step 2 : INT(LOG10(1021)) makes it 3
Step 3 : Adding 1 to step2 makes it 4, which is result of number of digits in 1021.

Viewing all articles
Browse latest Browse all 425

Trending Articles