A Crash Course in Survival Analysis: Customer Churn (Part II)
Joshua Cortez, a member of our Data Science Team, has put together a series of blogs on using survival analysis to predict customer churn. This is part two of the blog series.
The survival curve is fundamental in survival analysis. It tells us the probability that a customer will still be subscribed to the company over a period of time. The longer the time, the smaller the probability of surviving.
One way of estimating the survival curve is through using the Weibull distribution. The Weibull distribution is a natural choice for modelling time-to-death data. Using this, we can sketch the survival curve of a typical customer in our example dataset from the first blog.
Here’s our survival curve. It isn’t steep as you might expect. For instance, there’s around a 50% chance that a customer longer than 120 months (almost 10 years) will churn. It is also important to check if this is consistent with the business’ understanding of their customer lifecycle. If their customers are churning much earlier/later than the business perceives them to be, then the business may have to tweak its customer lifecycle management.
It also may be a good idea to intervene and incentivise customers who have already stayed for 10 years. Since their probability of staying is dipping below 50%, then without intervention, they are more likely to churn than not to churn.
Comparing Survival Curves
We can first look at total counts. How many females have churned versus males?
It looks like males and females churn in similar proportions. We expect the survival curves to also be almost the same.
The survival curves are indeed the same, and furthermore they look identical to the survival group of the whole population. This means we can expect males and females to last around 9.2 years with the telco.
Those without dependents are going to last around 7.5 years. On the other hand, those with dependents last significantly longer, around 25 years. Performing a log-rank test will confirm that these survival curves are indeed different from each other. The result of the test (given alpha = 0.95) should be a rejection of the null hypothesis.
In case hypothesis testing and p-values sound arcane to you, here’s a quick guide from XKCD.