This week I presented “Introduction to Machine Learning with XGBoost” to my stats colleagues. Why do I want to learn XGBoost and introduce to other statisticians? XGBoost is one of the most popular machine learning algorithms these days. It has won the RAAD (Roche Advanced Analytics Data) challenge among 81 teams across 19 Roche sites in early 2018! Among the 81 teams, different methods (including logistic regression) have been tried, the top 3 teams all used XGBoost! Moreover, XGBoost is used in more than half of the winning solutions in machine learning challenges hosted at Kaggle. I was surprised to find out that Genentech has launched a competition “Cervical Cancer Screening: Help prevent cervical cancer by identifying at-risk populations” on Kaggle in 2015 (with a prize of £100,000). The first place and second place all used XGBoost and they achieved the AUC of 0.96 for test data!
If you are interested to know what is XGBoost by reading my post till now, you can read my slides “Introduction to Machine Learning with XGBoost” here. If you press key “p” once the HTML page is open, you can read the speaker notes which provide more information than the slides itself (I tried not to put too many texts in slides).
The talk starts with “Introduction to XGBoost”, followed by “How to use XGBoost in R”. Last but not least, “Further Readings” provide you some online materials for further learning.
Hope you will enjoy the reading!
PS: The slides were created with R Markdown using xaringan package. The plots and numbers in the slides were inserted automatically into the slides by R Markdown. So you might be interested to learn “Create Presentation Slides using xaringan through R Markdown”.