Quantcast
Channel: ListenData
Viewing all articles
Browse latest Browse all 425

R : Convert categorical variable to numeric

$
0
0
In classification models, we generally encounter a situtation when we have too many categories or levels in independent variables. The simple solution is to convert the categorical variable to numeric and use the numeric one in the model. The easiest way to convert categorical variables to numeric is by replacing raw category value with the average response value of the category.
R Script : Convert categorical variable to numeric

R Script : Converting Categorical Variables to Numeric

library(RCurl)
library(qdapTools)
# Reading data file
urlfile <-'https://raw.githubusercontent.com/hadley/fueleconomy/master/data-raw/vehicles.csv'
x <- getURL(urlfile, ssl.verifypeer = FALSE)
vehicles <- read.csv(textConnection(x))

# Cleaning up the data
vehicles[is.na(vehicles)] <- 0

# Create dependent variable
vehicles$depvar <- ifelse(vehicles$cylinders == 6, 1,0)

# Specify categorical variables for which you need transformation
combinelist = c("drive","fuelType")

TransformCateg <- function(inputdata,depvar){
require(qdapTools)
depvar1 = deparse(substitute(depvar))
temp <- data.frame(c(rep(0,nrow(inputdata))), row.names = NULL)
for (variable in combinelist){
  x <- tapply(inputdata[, depvar1], inputdata[,variable], mean)
  x <- data.frame(row.names(x),x, row.names = NULL)
  temp <- data.frame(temp,round(lookup(inputdata[,variable], x),2))
  colnames(temp)[ncol(temp)] <- paste("mean",variable, sep = "_")
  temp2 = cbind(inputdata, temp[,-1])
}
return (temp2)
}

# Run Function
traindat2 = TransformCateg(vehicles, depvar)

R Script : WOE Transformation of Categorical Variables 

Viewing all articles
Browse latest Browse all 425

Trending Articles