In every business, customers are one of your most important assets. It is always much easier to sell to existing customers than to new ones, so keeping your hard-earned customer should be a priority at all times. As getting new customers almost always costs you money, you want to minimize churn; you want to avoid filling water into a “leaky bucket”.
That said you also have to make sure that you are targeting the right type of customers. “The right type of customers” is such a broad and overused cliche and it surely takes a lot of time to find your Ideal Client Profile (ICP) even for businesses that generate millions in revenues.
In the following paragraphs I will show how to create a binary classification with the use of basic statistical methods that are easily doable with Excel. With a binary classification model we will be able to see how likely current customers are to churn, based on customers that have already churned. We will also be able to use our model to select future customers more effectively based on a couple select variables.
I have to point out that the following method is best applicable for businesses that operate with long term contracts and customers can inherently stay with the business for extended periods of time. In order to have enough data you also need to have at least 2–300 customers already, so that there something to build on
If you are from Central or Eastern Europe, building a late-seed, Series A stage SaaS company and looking for funding, reach out to us or add me on Linkedin.
What is binary classification? Binary (0 or 1) classification means we classify a particular case into a “yes” or a “no” case. This method of classification was first used in the second world war in, where flight radar operators only had a very blurry radar image, but based on that they needed to decide whether what they saw is a bunch of enemy bombers (1) or just seagulls (0). Based on this they would alarm the anti-bomber aircrafts, which are very costly to do, but are potentially life saving. With the help of a model they could set a “blur-size” threshold that allowed them to predict what they saw on the radar based on a variable.
In our case, binary classification helps us to classify customers to churning or non-churning categories. Essentially the question that we are trying to answer is that given some inputs, is this customer going to churn or not? This is done by producing a score created out of the inputs and setting a threshold of this score above which the customer is likely to churn.
A binary classification model will help us determine the outcome, before it happens. The goal of every binary classification model is to determine the actual outcome before it occurs as accurately as possible. As we have Actual Values and Predicted Values, we have the following cases:
If we put the actual outcomes (incidence) and the predicted outcomes in a matrix we get what is known as the confusion matrix. On the Y axis of that we will have the actual outcome (churned or not) on the X axis we will have the predicted outcome that will come based on our model. Our goal is to minimize the occurrence of False Positives and False Negatives.
To measure the effectiveness of our model, we can use the Receiver Operating Characteristic (ROC) curve and more specifically the Area Under the ROC Curve (AUC). To compile the ROC score we are checking the TP and FP rates of our model at every possible threshold, and plot them onto the Y and X axis respectively. If you are interested in going deeper, check this excellent article. This results in a curve and the Area Under this Curve helps us assess how well our model works.
The AuC of a binary classification model varies between 0.5. — 1 the former being a random classifier (i.e. the model is worthless) while the latter is a perfect classification that rarely exists in real life. A solid model usually has a 0.7–0.8 AUC.
Every binary classification model should have a dependent variable (DV) that is influenced by one or more independent variables (IV). This is going to be the basis of creating a model that will look at the IVs and predict the DV and ultimately be able to classify the customer into churning vs non-churning customer. This will help us assess our current customer base, but if we are choosing the right metrics it will also help us to look at a customer before or during the actual sales process.
So, essentially we are creating a simple model to weed out the good customers from the bad ones and overtime refine our Ideal Customer Profile (ICP) to the point where customers will not churn due to non-ideal customer selection. It is important to note that this method is never going to be perfect and even customers that look ideal can churn due to product or other reasons and vice versa.
Before we start collecting the data, we need to make a little bit of thinking to figure out what variables would be good as IVs. Ideally, we want some customer data that is available before and during the sales process. Some ideas include:
There can be many other variables depending on your industry and niche. It is best to list these ideas and then doing the ultimate acid test: which of these do we have currently available and which of these can we gather FOR EVERY SINGLE CUSTOMER. You should select as many datapoints as you can, possibly up to 10, as some of them (maybe all of them) will not be relevant.
This part is super important as the model will only be as good as the quality of the data behind it. If you have been using a CRM, most of this data should be available in there, but it is possible that you have to cross reference them with other sources. You can also use other tools like Clearbit, Zoominfo, DataFinder or any other data enrichment tool so you can more datapoints to use.
Once you have figured out the data points you want to gather, it is time to compile them into a dataset.
In order to be able to carry out meaningful analysis, we need as much data as possible. To get any meaningful results we need to have at least a couple hundred customers, which we will randomly split to 2 groups: the Training Set (around 70% of the data) and the Test Set (30% of the data). The Training Set will be used to build out our model. After we built the model, we can use the Test Set to measure the effectiveness of the model and see whether it has changed in any significant way.
Our job is to create a model that combines our variables in a way that we can create a score out of them. After that we will be able to set a threshold to this score, which will be used to predict whether this particular customer is going to be likely to churn or not.
For this we need to clean the data as much as possible and select the IVs that are most relevant in predicting our outcome.
One note here is that we can actually include text-based variables as well and convert them to numeric values by running a pivot on the data, copying the options from the Pivot table and creating a lookup table by assigning numeric values to each text field. (See Lookup tables tab).
You can also create variables from Date type variables, by performing calculations on them for example, subtracting 2 dates from each other (like number of days elapsed between the creation of the customer and their last funding date, or between today), this way you can also make use of date type variables.
If you are from Central or Eastern Europe, building a late-seed, Series A stage SaaS company and looking for funding, reach out to us or add me on Linkedin.
This is actually the hardest part: we have selected IVs, gathered them from across systems and we have our DVs that is the dummy variable (0 or 1) showing whether that customer has churned or not. How do we make sure that the data points we will use in our model actually explain the churn at all?
Step 1
First it is good practice to look at the outliers and replace them with a more analysis friendly variable, like the median or mean of the remainder of the values. Usually the lower and upper 1–2% of values are replaced this way, making it easier to create the model. You can use a histogram to look at what these values are and use a data filter to delete them. Then you can calculate a median or mean from the rest of the values and replace the empty values with it.
Step 2
To determine which IVs will be useful, we will run a correlation analysis for the IVs and the DV. We can do this a couple different ways. We will start working with the Training set first, using our variables as is. To understand correlations (a movement in one variable to some extent explaining the movement in another variable) we can use a couple of different tools:
Based on the above you can safely exclude IVs that have little or no correlation to the data or produced low AuC values. This can mean that you are left with 2–3 IVs only, which is frustrating, given that you might have started with 20–30, but hey at least you found something that can work!
Before we start creating the model we can do one more thing to help our work (something we could have done actually before running the correlations as well): standardize our data.
Standardizing is pretty helpful in the sense that you can combine different types of variables easier and retain their information value at the same time.
For standardization you will need to calculate the Mean and Standard Deviation for each IV and then use the STANDARDIZE function, to create the standard values. This will help you convert values that are far from from each other (like website visitor numbers and percentages) to use for the model. You can and should re-run correlations to see if anything changes, higher value IVs tend to have more correlations to the DV when standardized.
After so much time spent on preparing the data, let’s get down to the most important part: creating our predictive model. A “model” in this case is just a score that is created from the IVs in some way. You can try adding, subtracting, multiplying variables or weighting them, whatever works, the goal is to get the AuC as high as possible. Now this is more art than science. You are free to roam free and combine all IVs in any way you want, be it standardized or non-standardized, but you have a goal: to have the AuC as high as possible, but in any case higher than 0.50 otherwise, your model is worthless.
You can use this spreadsheet to test your model scores. Just insert your scores to Column A and the actual outcomes (0 and 1) Column B and the spreadsheet will spit out the AuC*. You can also insert the cost of FP and FN and get the optimal threshold as we will explain later in the article.
Play around with your model to the point where you think you achieved maximum AuC.
Once we think we achieved max AuC and nothing we do pushes it any higher we need to come up with the threshold that will minimize the number of False Positives and False Negatives. The score threshold separates churning vs non-churning customers, so it is important to set it up in a way that is helpful to our business.
We can do this two ways:
In case of SaaS companies my view is that FP (Customers identified as likely to churn but are in fact not) are less of a problem than FN (Customers not identified as likely to churn but in fact are churning), but of course it really depends on the market size you are targeting.
This relates in a way to our market size as well — If you have a lot of potential customers, you can be more selective with who you are selling to, so you lose less on a FP than on a FN. Of course, if you are limited by the market or some other factor, losing a customer is as bad as not acquiring one.
This will also depend on when you are planning to use this model: during sales, to assess the customer potential, or as a Customer Success tool to assess if the customer is going to churn or not.
Based on this you need to set the FP and FN to your liking on the AuC calculator, and it will give you the score threshold that you can use to classify customers.
Once you have your model and your threshold, it is time to add some new data to it that you separated before (around 30% of your original data) and test the AuC on that as well.
It is normal to see a little drop in the AuC with the new data, however a significant drop of 0.1 or more means that you have probably overfitted the model to the Training Set. This means that your model works well for the particular cases in the Training Set but when fed new data, they produce worse results. If this happens, you will need to go back and rework the model in a way that the drop in AuC is less significant.
Using the above-mentioned guidelines I used real live data to set up a model. You can see the file here. This particular model, as much as I wanted it, could not produce an AuC above 0.69, which means that it is not a particularly effective one, unfortunately :).
I still wanted to show it here as this illustrates the challenge that Data Analysts and Data Scientists face every day: working with incomplete data and struggling to find a good predictive model for some use cases. This should not discourage you from creating your own models and use it to select and retain the right type of customers!
If you are from Central or Eastern Europe, building a late-seed, Series A stage SaaS company and looking for funding, reach out to us or add me on Linkedin.
*Source: Duke University, Mastering Data Analysis in Excel Course