Quantcast
Channel: ListenData
Viewing all articles
Browse latest Browse all 425

Missing Imputation with MICE Package in R

$
0
0
In R, the mice package has features of imputing missing values on mixed data.
Missing Imputation with mice package in R

Variable Type with Missing Imputation Methods
  1. For Continuous Data - Predictive mean matching, Bayesian linear regression, Linear regression ignoring model error, Unconditional mean imputation etc.
  2. For Binary Data - Logistic Regression, Logistic regression with bootstrap
  3. For Categorical Data (More than 2 categories) - Polytomous logistic regression, Proportional odds model etc,
  4. For Mixed Data (Can work for both Continuous and Categorical) - CART, Random Forest, Sample (Random sample from the observed values)
anscombe <- within(anscombe, {
y1[1:3] <- NA
y4[3:5] <- NA
})
imp = mice(anscombe)
imp1 = complete(imp)
Important Points:
  1. By default, the "mice" function creates multiple level (k=5) imputation.
  2. The "complete"function is used to prepare your final data with imputation. By default, it picks first level imputation scores.

    Custom mice function
    imp = mice(anscombe, m=1)
    imp1 = complete(imp, 1)
    Default settings in the mice package

    If nothing is specified in the method option (as shown in the above example), it checks, by default, the variable type and applies missing imputation method based on the type of variable.
    1. Predictive mean matching (continuous data)
    2. Logistic regression imputation (binary data, factor with 2 levels)
    3. Polytomous regression imputation for unordered categorical data (factor>= 2 levels)
    4. Proportional odds model (ordered, >= 2 levels)

      CART : Imputation Algorithm
      imp = mice(anscombe, meth = "cart", minbucket = 5)
      imp1 = complete(imp)
      Random Forest : Imputation Algorithm

      Simulations by Shah (Feb 13, 2014) suggested that the quality of the imputation for 10 and 100 trees was identical, so mice 2.22 changed the default number of trees from ntree = 100 to ntree = 10.
      imp = mice(anscombe, meth = "rf", ntree = 10)
      imp1 = complete(imp)
      Important Note : You can ignore minbucket and ntree in the above code. The package can take default values.

      Viewing all articles
      Browse latest Browse all 425

      Trending Articles