What marketing campaigns could help reduce customer churn?
python
altair
EDA
Author
Sandra Jurela
Published
June 10, 2023
Introduction
In this hypothetical case study I’ve been hired as a financial analyst by the marketing department within a bank.
They are faced with more and more customers leaving their credit card services. They would like to understand what characteristics lend themselves to someone who is going to churn so they can proactively go to the customer to provide them better services and turn customers’ decisions in the opposite direction.
As a part of my first analysis, they’ve asked me to take a look at the available data and help them understand how to increase customer retention.
Problem Statement
What marketing campaigns could we implement to help reduce customer churn?
Imports
import pandas as pdimport altair as alt# Show all columns (instead of cascading columns in the middle)pd.set_option("display.max_columns", None)# Don't show numbers in scientific notationpd.set_option("display.float_format", "{:.2f}".format)# Enable plotting more than 5000 pointsalt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')
I will use Altair for data visualizations. It’s a declarative statistical visualization library for Python, built on top of the powerful Vega-Lite grammar, It’s very intuitive to use, and after some experience with Matplotlib and Seaborn, I’m pleasantly surprised with it. 😇
Data Overview
The dataset consists of about 10,000 customers described with their age, income level, marital status, credit card limit, credit card category, etc. It’s available on Kaggle at the following link.
Data Dictionary
CLIENTNUM - Unique identifier for the customer holding the account
Attrition_Flag - If the account is closed then Attrited Customer, else Existing Customer
Customer_Age - Customer’s age in years
Gender - Male = M, Female = F
Dependent_count - Number of dependents
Education_Level - Educational qualification of the account holder (example: high school, college graduate, etc.)
Marital_Status - Married, Single, Divorced, Unknown
Income_Category - Annual income category of the account holder (Less than $40K, $40K-$60K, $60K-$80K, $80K-$120K, $120K +
Card_Category - Type of card (Blue, Silver, Gold, Platinum)
Months_on_book - Period of relationship with bank
Total_Relationship_count - Total number of products held by the customer
Months_Inactive_12_mon - Number of months inactive in the last 12 months
Contacts_Count_12_mon - Number of contacts in the last 12 months
Credit_Limit - Credit limit on the credit card
Total_Revolving_Bal - Total revolving balance on the credit card
Avg_Open_To_Buy- Open to buy credit line (average of last 12 months)
Total_Amt_Chng_Q4_Q1 - Change in transaction amount (Q4 over Q1)
Total_Trans_Amt - Total transaction amount (last 12 months)
Total_Trans_Ct - Total transaction count (last 12 months)
Total_Ct_Chng_Q4_Q1 - Change in transaction count (Q4 over Q1)
Avg_Utilization_Ratio - Average card utilization ratio
# Read in the data and remove the last two columns irrelevant for the analysisdata = pd.read_csv("data/BankChurners.csv").iloc[: , :-2]data.head()
CLIENTNUM
Attrition_Flag
Customer_Age
Gender
Dependent_count
Education_Level
Marital_Status
Income_Category
Card_Category
Months_on_book
Total_Relationship_Count
Months_Inactive_12_mon
Contacts_Count_12_mon
Credit_Limit
Total_Revolving_Bal
Avg_Open_To_Buy
Total_Amt_Chng_Q4_Q1
Total_Trans_Amt
Total_Trans_Ct
Total_Ct_Chng_Q4_Q1
Avg_Utilization_Ratio
0
768805383
Existing Customer
45
M
3
High School
Married
$60K - $80K
Blue
39
5
1
3
12691.00
777
11914.00
1.33
1144
42
1.62
0.06
1
818770008
Existing Customer
49
F
5
Graduate
Single
Less than $40K
Blue
44
6
1
2
8256.00
864
7392.00
1.54
1291
33
3.71
0.10
2
713982108
Existing Customer
51
M
3
Graduate
Married
$80K - $120K
Blue
36
4
1
0
3418.00
0
3418.00
2.59
1887
20
2.33
0.00
3
769911858
Existing Customer
40
F
4
High School
Unknown
Less than $40K
Blue
34
3
4
1
3313.00
2517
796.00
1.41
1171
20
2.33
0.76
4
709106358
Existing Customer
40
M
3
Uneducated
Married
$60K - $80K
Blue
21
5
1
0
4716.00
0
4716.00
2.17
816
28
2.50
0.00
data.shape
(10127, 21)
We have a total of 10127 customers described with 21 attributes.
This dataset is actually very clean. We’ll only do some preprocessing steps before the analysis.
Data Preprocessing
Binning Customer Age
We’ll create specific age categories and place ages within a range in these categories. For instance, category 20s will contain ages between 20 and 29, and so on.
# looking for min and max ageprint(data['Customer_Age'].min())print(data['Customer_Age'].max())
It looks pretty normal outside of a very strong peak at ~36 months (roughly 2800 customers). It might be a data error (default months for those without a record), or there was a major marketing campaign that brought in a lot of users 36 months ago.
This plot quickly shows us that the median transaction count is slightly higher for females than males and outliers seem to be present among both groups.
Pyramid chart
Was there a difference in the distribution of customers by age and gender? A pyramid chart can help us find the answer.
To get the data in the format we need, we’ll aggregate the data up to the age and gender level.
Here we see that there is no real difference in the distribution of customers by age and gender. The majority of customers are in their 40s. We have only two male customers in their 70s, which is hardly visible here.
📊 Categorical variables
Next, we’ll see how attrited and existing customers are represented in all category groups.
# get the list of categorical variables, remove `Attrition_Flag`cat_list =sorted((list(set(data.columns) -set(data._get_numeric_data().columns))))[1:]cat_list
Nothing strange here. Churned customers are almost equally distributed across all category groups.
Individual values for customers with a platinum credit card and the age group 70s deviate from the average due to the small number of customers in these groups.
📊 Comparing Categories
We will look at the average Credit_Limit by age group.
bar = alt.Chart(data).mark_bar().encode( alt.X('Customer_Age_bins', axis=alt.Axis(labelAngle=-0), title='Customer age group'), alt.Y('mean(Credit_Limit)'), alt.Color('Customer_Age_bins', legend=None)).properties(width=300)error_bars = alt.Chart(data).mark_errorbar(extent='ci').encode( x='Customer_Age_bins', y='Credit_Limit', strokeWidth=alt.value(2))bar + error_bars
No. Men and women are equally educated. But compared to women in general, men are clearly more represented in better-paid jobs and in better-paid positions.
📝 Pivot table
Coming back to the business problem, we want to understand something about customers who are leaving. Let’s see if we can see anything obvious about their behavior.
(data .groupby(['Attrition_Flag']) .agg({'CLIENTNUM':'nunique', # number of unique customers in each group'Customer_Age': 'mean', # the rest are looking at the mean per group'Dependent_count': 'mean','Months_on_book': 'mean','Total_Relationship_Count': 'mean','Months_Inactive_12_mon': 'mean','Contacts_Count_12_mon': 'mean','Credit_Limit': 'mean','Total_Revolving_Bal': 'mean','Avg_Open_To_Buy': 'mean','Total_Amt_Chng_Q4_Q1': 'mean','Total_Trans_Amt': 'mean','Total_Trans_Ct': 'mean','Total_Ct_Chng_Q4_Q1': 'mean','Avg_Utilization_Ratio': 'mean','Avg_Transaction_Value': 'mean',}) .transpose() .assign(Diff = (lambda x: (x['Attrited Customer'] / x['Existing Customer'] -1))) .sort_values('Diff') .rename_axis(None, axis=1))
Attrited Customer
Existing Customer
Diff
CLIENTNUM
1627.00
8500.00
-0.81
Total_Revolving_Bal
672.82
1256.60
-0.46
Avg_Utilization_Ratio
0.16
0.30
-0.45
Total_Trans_Ct
44.93
68.67
-0.35
Total_Trans_Amt
3095.03
4654.66
-0.34
Total_Ct_Chng_Q4_Q1
0.55
0.74
-0.25
Total_Relationship_Count
3.28
3.91
-0.16
Total_Amt_Chng_Q4_Q1
0.69
0.77
-0.10
Credit_Limit
8136.04
8726.88
-0.07
Avg_Open_To_Buy
7463.22
7470.27
-0.00
Months_on_book
36.18
35.88
0.01
Customer_Age
46.66
46.26
0.01
Avg_Transaction_Value
63.59
62.43
0.02
Dependent_count
2.40
2.34
0.03
Months_Inactive_12_mon
2.69
2.27
0.18
Contacts_Count_12_mon
2.97
2.36
0.26
We can see that the Total_Revolving_Bal and Avg_Utilization_Ratio show the biggest differences (45% below existing customers), along with Total_Trans_Ct and Total_Trans_Amt (35% below existing customers), and Total_Ct_Chng_Q4_Q1 (25% below existing customers).
Average Card Utilization Ratio
It represents the amount of revolving credit customer is using divided by the total credit available to them.
Let’s look at the Avg_Utilization_Ratio distribution.
We can see that 57% (934) of churned customers have the average utilization ratio less than 2.5%.
How many churned customers have a credit card utilization ratio of zero?
data.query('(Avg_Utilization_Ratio == 0) and (Attrition_Flag=="Attrited Customer")').shape[0]
893
There are 893 (55%) churned customers whose credit card utilization ratio is zero, which means they pay off their balances in full before the statement due date.
📊 Transaction Amounts and Counts relationship
Scatter plot with Transaction Amounts and Counts looks interesting.
This plot clearly shows three distinct groups. Will anything show up if we add Attrition_Flag as a color? We’ll also add histograms for transaction amounts and counts.
In terms of average credit limit, churned customers with transaction amounts between $4K and $11K are more like existing customers with transaction amounts above $11K (very similar average credit limit, around $14K). Simply put, they spent less. And as a result, the translation of those points (churned customers) to the left and downward may be seen on the previous graph.
📊 Can we influence the Q4 to Q1 dip?
The Q4-Q1 change (Total_Ct_Chng_Q4_Q1) is significantly different for churned customers too! This indicates that churned customers are spending significantly less (-25% lower than their counterparts) after the holiday season, pulling back much more sigificantly. If we can impact this variable, we can also directly impact the Total Transaction Count!
alt.Chart(data).transform_density('Total_Ct_Chng_Q4_Q1', groupby=['Attrition_Flag'], as_=['Total_Ct_Chng_Q4_Q1', 'density']).mark_area(opacity=0.7, clip=True).encode( alt.X('Total_Ct_Chng_Q4_Q1', scale=alt.Scale(domain=[0, 2]), title='Total Transaction Count Change (Q4 to Q1)'), alt.Y('density:Q', scale=alt.Scale(domain=[0, 3])), alt.Color('Attrition_Flag', sort='descending')).properties(title='Change in Transaction Count (Q4 over Q1)', width=500, height=300).configure_title(fontSize=16, dy=-10, anchor='start')
Promotions during Q1 time to keep the spending levels high. The more we spend the more attached we feel to the card, it’s benefits, etc. A “prevent the cliff” campaign where all customers who have historically shown a strong drop off, will get targeted with this promo.
Customer surveys. For those who are spending above $11,000 to understand why they love our card and what keeps them around. If we were also able to get responses from some of our churn customers, we can develop stronger marketing campaigns around our findings.
Offer loyalty points, cash back, etc.
Look at any historical marketing campaigns to see what we can learn from what worked / didn’t work.