Predictive Analytics and Advanced Machine Learning in R and Azure

Use R and Azure ML Studio to build and tune advanced machine learning models.

Predictive analytics and machine learning techniques are revolutionizing business and government. Predictive Analytics and Machine Learning in R and Azure is aimed at the person who wants to have a better understanding of the mechanics behind the models and how these models are realistically applied in the business setting. In addition to covering advanced machine learning techniques in depth, the course covers the management of stakeholder expectations during predictive analytics projects and analytics project management.  Advanced machine learning methods are discussed in depth, including those used to win global data science competitions.

This course is taught in R and Azure.

Who Should Attend?

This course is suited to any professional who already understands analytics and machine learning basics and is ready to progress to higher levels of sophistication. It is also suitable to any professional who is interested in who predictive analytics projects are conceptualised, scoped and project managed.


It is recommended that students have completed an introductory R programming course or MOOC and at least one 2nd year statistics unit at the university level.

Laptop Required Specs

Intel i3 processor, 4GB RAM

Windows operating system

Unrestricted PC that has install permissions

Software Requirements

Excel 2010 / 2013 / 2016

R or RStudio Latest version

A free trial or paid subscription to Microsoft Azure ML Studio

Course Objectives

  • Provide participants with a full understanding of multiple linear regression, logistic regression, market basket analysis and decision trees
  • Provide participants with a full understanding of model diagnostics and underlying assumptions for a range of predictive models that are appropriate in the business context
  • Provide participants with a framework for selecting an appropriate predictive model depending on the task at hand
  • Provide participants with a qualitative and quantitative methodology for assessing the performance of predictive models
  • Provide participants with a firm grounding in the business case for Market Basket Analysis and the mathematics underlying this technique
  • Provide participants with guidelines on how to present predictive analytics results in the corporate setting (taught via simulated boardroom presentations in class)

Upon successful completion of this course, participants will be able to:

  • Build and use predictive models to inform decision making in the face of uncertainty
  • Build robust predictive models that predict sales, revenues, employee attrition, customer churn or other events
  • Know when to apply a classification algorithm instead of a continuous prediction algorithm
  • Understand the limitations and benefits of predictive models and how to communicate these limitations and benefits to senior decision makers
  • Separate esoteric and academic predictive models from those that are proven to be robust in the business setting
  • Understand how to calculate the ROA (return on analytics) for predictive analytics projects
  • Recognise business opportunities where the investment in predictive analytics is justified in the context of project risk and expected benefits
  • Understand how to manage predictive analytics project risk proactively whilst preventing time and cost overruns

Day 1


Day 1

I. Introductions (9:00am – 9:30am)

II. Dimensionality, Parsimony, Testing Accuracy (9:30am – 10:00am)

  • The curse of dimensionality
  • The principle of parsimony
  • Testing model accuracy
  • John Elder’s Target Shuffling
  • Lift charts
  • Bootstrap sampling

III. Q&A / Break (10:15am – 10:30am)

IV. Shrinkage – More Than What Happens in the Pool (10:30am – 11:00am)

  • How shrinkage methods depart from traditional statistical methods
  • Ridge regression
  • The LASSO method
  • How does the LASSO method help perform variable selection?
  • Sparsity

V. Q&A Break (11:00am – 11:15am)

VI. Workshop: Team Activity – Let’s compare LASSO and ridge regression (11:15pm – 12:00nn)

VII. Lunch (12:00nn – 1.00pm)

VIII. Workshop: Team Activity (cont.)- Let’s compare LASSO and ridge regression (1:00pm – 1:30pm)

IX. Cross Validation, Bagging and Ensembling (1.30pm – 2:15pm)

  • Bootstrap aggregation
  • K-fold cross validation
  • Model ensembling
  • Choosing weights for ensemble models

X. Q&A / Break (2:15pm – 2:30pm) 

XI. Workshop: Let’s bag, ensemble and cross validate! (2:30pm – 4:45pm)

XII. Workshop Feedback, Presentation from Winning Model and Day 1 Wrap Up  (4:45pm – 5:00pm)

Day 2


I. Artificial Neural Networks (9:00am – 10:00am)

  • A gentle introduction to ANNs using colors
  • What is deep learning?
  • What is forward and back propagation?
  • How many hidden layers should we use?
  • ANN and linear regression smack down in Azure ML Studio

II. Q&A / Break (10:00am – 10:15am)

III.  Workshop: Team Activity – Let’s build and tune neural nets (10:15am – 12:00nn)

IV. LUNCH (12:00nn – 1:00pm)

V. Predictive Analytics in Practice – Managing Analytics Projects and Teams (1:00pm – 1:45pm)

  • Where should the analytics team be situated in the corporate structure? Research findings.
  • Managing stakeholder expectations in analytics projects
  • The importance of having analytics champions
  • Project management for analytics projects – how does it differ from regular IT projects?

VI. Q&A / Break (1:45pm – 2:00pm)

VII. Support Vector Machines (2:00pm – 3:00pm)

  • The maximal margin classifier
  • The support vector classifier
  • Kernels and SVMs
  • Performance comparison to other classification methods

VIII. Q&A / Break (3:00pm – 3:15pm)

IX. Workshop: Let’s build and tune SVMs! (3:15pm – 4:45pm)

X. Workshop Feedback, Presentation from Winning Model and Day 2 Wrap Up (4:45pm – 5:00pm)

Day 3


I. Market Basket Analysis and Affinity Analysis I (9:00am – 9:45am)

  • What is association rule mining?
  • What is the business case for market basket analysis?
  • Support, lift and confidence
  • Visualizing market basket results

II. Q&A / Break (9:45am – 10:00am)

III. Workshop: Let’s use arules to perform MBA on supermarket data (10:00am – 11:30am)

IV. Introduction to Kaggle Competitions (11:30am – 12:15pm)

  • Kaggle overview
  • Kaggle competition strategies
  • Private and public LB
  • Team merging

V. LUNCH (12:15pm – 1:15pm)

VI. Workshop: Team Activity (1:15pm – 4:45pm)

During this capstone team activity, course participants will enrol in a live Kaggle competition. With the aim of achieving a top 50% leaderboard ranking by the end of the day, the full data science process will be implemented. Toward the end of the task, a strategy for continued learning and success in the competition will be discussed.

VII. Workshop Feedback and Course Wrap-up (4:45pm – 5:00pm)