Joining and Merging in R
This tutorial explains how we can join (merge) two tables in R.Merging with RLet's create two tables - Table I : DF1df1 <- data.frame(ID = c(1, 2, 3, 4, 5), w = c('a', 'b', 'c', 'd',...
View ArticleImpute Missing Values with Decision Tree
CART has built-in algorithm to impute missing data with surrogate variables. The surrogate splits the data in exactly the same way as the primary split, in other words, we are looking for clones, close...
View ArticleMissing Imputation with MICE Package in R
In R, the mice package has features of imputing missing values on mixed data.Missing Imputation with mice package in RVariable Type with Missing Imputation MethodsFor Continuous Data - Predictive mean...
View ArticleCreate Dummy Columns From Categorical Variable
The following code returns new dummy columns from a categorical variable.Create dummy columns in RDF <- data.frame(strcol = c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))for(level in...
View ArticleEnsemble Learning : Stacking / Blending
Stacking (aka Blending)Stacking is a form of ensemble learning which aims to improve accuracy by combining predictions from several learning algorithms.Ensemble Learning : StackingStep I : Multiple...
View ArticleTime Series Forecasting - ARIMA [Part 1]
Introduction:Time Series : A time series is a data series consisting of several values over a time interval. e.g. daily BSE Sensex closing point, weekly sales and monthly profit of a company...
View ArticleCost Sensitive Learning For Churn Model
Model I : Churn (Attrition) Model - To identify customers who are more likely to leaveBackground : It is known that retaining an existing customer is about five times cheaper than acquiring a new...
View ArticleTime Series Forecasting - ARIMA [Part 2]
In this part we would cover the process of performing ARIMA with SAS and a little theory in between.Hope you have gone through the Part-1 of this series, here comes the Part-2 .Data File Location :...
View ArticleText Mining Basics
Text Mining TerminologiesDocument is a sentence. For example, " Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the...
View ArticleParallelizing Machine Learning Algorithms
In R, we can run machine learning algorithms in parallel model with doParallel and caret packages.Parallel Computing with Rlibrary(caret)library(doParallel)set.seed(1)ctrl <-...
View ArticleTime Series Forecasting - ARIMA [Part 3]
Here comes the climax of the Time Series Forecasting - ARIMA series. Hope you have gone through and enjoyed learning previous two articles in the series, if not then please do it.1. Time Series...
View ArticleLinear Regression with R
This article explains how to run linear regression with R.Linear Regression with RThe code below covers the assumption testing and evaluation of model performance :Data PreparationTesting of...
View ArticlePredicting Transformed Dependent Variable
In linear regression models, we generally transform our dependent variable to treat heteroscedasticity, non-normality of errors and non-linear relationship between dependent and independent...
View ArticleRegression : Transform Negative Values
In linear regression, box-cox transformation is widely used to transform target variable so that linearity and normality assumptions can be met. But box-cox transformation can be used only for strictly...
View ArticleLoops in R
This tutorial explains how to write loops in R. It also explains the APPLY family of functions. Create a sample data setdat <- data.frame(x = c(1:5,NA), z = c(1, 1, 0, 0, NA,0),...
View ArticleReturn Multiple Values For Lookup Value
This tutorial explains how to get multiple values for a single lookup value.Sample DataMultiple Values with VlookupOutputOutput SnapshotFormulaEnter the following formula in G9 and paste it down and to...
View ArticleR Which Function Explained
In R, the which() function gives you the position of elements of a logical vector that are TRUE.Examples1. which(letters=="z") returns 26.Create a simple data framels = data.frame( x1 =...
View ArticleR : Variable Selection - Wald Chi-Square Analysis
In logistic regression, we can select top variables based on their high wald chi-square value. In other words, we can run univariate analysis of each independent variable and then pick important...
View ArticleCalculate Wald Chi-Square Mathematically
In Logistic Regression, Wald Chi-Square is used to assess whether a variable is statistically significant or not.Logistic Regression : Wald Chi-SquareWald Chi-Square = Square of (Coefficient Estimate /...
View ArticleExcel : Find longest word in a cell
I was asked to find longest word in a cell. I was able to solve it via formula after an hour of scratching my head. It can be easily possible with UDF (User Defined Function).Excel Formula : Find...
View Article