Super Learner Prediction in R

Enoch Kan
Apr 4, 2017
1 min read

SuperLearner is a package in R that implements the super learner prediction method. It is still under development and is limited by many factors such as sample size. However, I ran a super learner prediction model for a Kaggle competition and was able to obtain a very good overall prediction (0.92) and rank (1/34).

Super learning is a general loss-based learning method that has been proposed and analyzed theoretically in van der Laan et al. (2007). This method is a prediction method designed to search for the optimal combination of a collection of traditional prediction algorithms such as Random Forest and k-Nearest Neighbors.

The R package "SuperLearner" implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner. User fits a SuperLearner model by calling the SuperLearner function. Below is an example demonstrated by Eric Polley (2011):

Note that in the SuperLearner function, Y is the response matrix/vector containing all the training outcomes. SL.library contains a vector of all the prediction methods specified by the user. For example SL.library = c('SL.glm', 'SL.nnet', 'SL.cforest') implies three prediction methods: glm (Generalised Linear Model), nnet (Neural Network) and cforest (Conditional Random Forests) are being used to calculate the optimal prediction.

Super Learner Prediction in R

A Short Introduction to RapidMiner

Neural Network in R

Super Learner Prediction in R

Comments