This tutorial explains how to use regular expression language (pattern matching) with SAS.
Sample Data
Important Functions for Pattern Matching
1. PRXMATCH
Searches for a pattern match and returns the position at which the pattern is found.
Example 1 :
Important Points
Sample Data
data x;
infile datalines truncover;
input name $100.;
datalines;
Deepanshu
How are you, deepanshu
dipanshu
deepanshu is a good boy
My name is deepanshu
Deepanshu Bhalla
Deepanshuuu
DeepanshuBhalla
Bhalla Deepanshu
;
run;
Important Functions for Pattern Matching
1. PRXMATCH
Searches for a pattern match and returns the position at which the pattern is found.
PRXMATCH (perl-regular-expression, variable_name)It returns the position at which the string begins. If there is no match, PRXMATCH returns a zero.
Example 1 :
data xx;
set x;
if prxmatch("/Deepanshu/", name) > 0 then flag = 1;
if prxmatch("/Deepanshu/i", name) > 0 then flag1 = 1;
if prxmatch("/^Deepanshu/i", name) > 0 then flag2 = 1;
if prxmatch("/\bDeepanshu\b/i", name) > 0 then flag3 = 1;
if prxmatch("/D[ai]panshu/i", name) > 0 then flag4 = 1;
if prxmatch("/D.panshu/i", name) > 0 then flag5 = 1;
proc print;
run;
:![]() |
Output : Pattern Matching |
- The /i in the regular expression makes search case-insensitive.
- The ^ in the regular expression tells SAS to search for the strings that starts with the search-string.
- The \b in the regular expression tells SAS to match word boundary.
- The \B in the regular expression tells SAS to match non-word boundary.
- The [ai] in the regular expression searches any of the characters within the string.
- The . in the regular expression tells SAS to take any of the characters within the string.
Example 2 : Search Multiple Sub Strings
data temp;
Input company $30.;
cards;
Tata
tata
Tataz
TataM Jan
Tata Motor
Reliance World
Reliance Ltd
Reliance Petro
Reliance Global
Vanucoverltd Company
;
run;
data temp1;
set temp;
if prxmatch("/\b(Tata|Reliance)\b/i",company) > 0;run;
Example 3 : Find Pattern
Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.
It performs a pattern-matching replacement.
Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.
data _null_;
x = 'A345';
x2 = 'A55A';
y = prxmatch("/^[a-zA-Z][0-9]{3}$/", x);
y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);
put y= y2=;
run;
2. PRXCHANGE Function
It performs a pattern-matching replacement.
PRXCHANGE(regular-expression, -1, variable)
Suppose you are asked to replace 'Tata' with 'Tata Group'.
data temp2;Note : The 's keyword indicates substitution.
set temp;
Company0 = PrxChange('s/\b(Tata)\b/Tata Group/i' , -1 , strip(company));
proc print;
run;
Remove a list of keywords such as Jan, Ltd, Company
Company1 = PrxChange('s/\b(Jan|ltd|Company)\b//i' , -1 , strip(company));