Introduction
The aim of this paper is to self-learn the fundementals of the XGBoost algorithm, how it’s implemented in R - and how to apply the package {xgboost} to real data.
My initial knowledge of XGBoost is quite limited - and as such, this will be a learning-as-writing process.
Having worked to a very limited extent with supervised learning models and predictive models, this approach may - or may not - prove useful.
What is XGBoost - and how is it commonly applied?
The abbreviation, XGBoost, stands for Extreme Gradient Boost. It’s basically a type of supervised learning algorithm which penalizes a model for both loss in explanatory power and increases in complexity. Each of these terms are contained in the Objective function, which the alogrithm seeks to optimize given the data.
The XGBoost has been implemented effficiently in R within the xgboost package. The framework can be scaled and has been used in winning multiple Kaggle competitions
.
Which concepts do I need to understand before delving into Extreme Gradient Boosting?
The theory of XGBoost
How to apply the algorithm to real-life data in R
Libraries:
pkgs <- c("xgboost",
"readr",
"stringr",
"caret",
"car",
"servr",
"knitr")lapply(pkgs, require, character.only = TRUE)How to gain insigths from the output
Sources:
AnalyticsVidhya XGBoost Official Documentation Tianqi Chen - Understanding XGBoost Model on Otto Dataset @ Kaggle