In classification models, we generally encounter a situtation when we have too many categories or levels in independent variables. The simple solution is to convert the categorical variable to numeric and use the numeric one in the model. The easiest way to convert categorical variables to numeric is by replacing raw category value with the average response value of the category.
R Script : Converting Categorical Variables to Numeric
R Script : WOE Transformation of Categorical Variables
![]() |
R Script : Convert categorical variable to numeric |
R Script : Converting Categorical Variables to Numeric
library(RCurl)
library(qdapTools)
# Reading data file
urlfile <-'https://raw.githubusercontent.com/hadley/fueleconomy/master/data-raw/vehicles.csv'
x <- getURL(urlfile, ssl.verifypeer = FALSE)
vehicles <- read.csv(textConnection(x))
# Cleaning up the data
vehicles[is.na(vehicles)] <- 0
# Create dependent variable
vehicles$depvar <- ifelse(vehicles$cylinders == 6, 1,0)
# Specify categorical variables for which you need transformation
combinelist = c("drive","fuelType")
TransformCateg <- function(inputdata,depvar){
require(qdapTools)
depvar1 = deparse(substitute(depvar))
temp <- data.frame(c(rep(0,nrow(inputdata))), row.names = NULL)
for (variable in combinelist){
x <- tapply(inputdata[, depvar1], inputdata[,variable], mean)
x <- data.frame(row.names(x),x, row.names = NULL)
temp <- data.frame(temp,round(lookup(inputdata[,variable], x),2))
colnames(temp)[ncol(temp)] <- paste("mean",variable, sep = "_")
temp2 = cbind(inputdata, temp[,-1])
}
return (temp2)
}
# Run Function
traindat2 = TransformCateg(vehicles, depvar)
R Script : WOE Transformation of Categorical Variables