Sandra Jurela - Bank Customer Retention Analysis in Python

Introduction

In this hypothetical case study I’ve been hired as a financial analyst by the marketing department within a bank.

They are faced with more and more customers leaving their credit card services. They would like to understand what characteristics lend themselves to someone who is going to churn so they can proactively go to the customer to provide them better services and turn customers’ decisions in the opposite direction.

As a part of my first analysis, they’ve asked me to take a look at the available data and help them understand how to increase customer retention.

Problem Statement

What marketing campaigns could we implement to help reduce customer churn?

Imports

import pandas as pd
import altair as alt

# Show all columns (instead of cascading columns in the middle)
pd.set_option("display.max_columns", None)
# Don't show numbers in scientific notation
pd.set_option("display.float_format", "{:.2f}".format)

# Enable plotting more than 5000 points
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

I will use Altair for data visualizations. It’s a declarative statistical visualization library for Python, built on top of the powerful Vega-Lite grammar, It’s very intuitive to use, and after some experience with Matplotlib and Seaborn, I’m pleasantly surprised with it. 😇

Data Overview

The dataset consists of about 10,000 customers described with their age, income level, marital status, credit card limit, credit card category, etc. It’s available on Kaggle at the following link.

Data Dictionary

CLIENTNUM - Unique identifier for the customer holding the account
Attrition_Flag - If the account is closed then Attrited Customer, else Existing Customer
Customer_Age - Customer’s age in years
Gender - Male = M, Female = F
Dependent_count - Number of dependents
Education_Level - Educational qualification of the account holder (example: high school, college graduate, etc.)
Marital_Status - Married, Single, Divorced, Unknown
Income_Category - Annual income category of the account holder (Less than $40K, $40K-$60K, $60K-$80K, $80K-$120K, $120K +
Card_Category - Type of card (Blue, Silver, Gold, Platinum)
Months_on_book - Period of relationship with bank
Total_Relationship_count - Total number of products held by the customer
Months_Inactive_12_mon - Number of months inactive in the last 12 months
Contacts_Count_12_mon - Number of contacts in the last 12 months
Credit_Limit - Credit limit on the credit card
Total_Revolving_Bal - Total revolving balance on the credit card
Avg_Open_To_Buy- Open to buy credit line (average of last 12 months)
Total_Amt_Chng_Q4_Q1 - Change in transaction amount (Q4 over Q1)
Total_Trans_Amt - Total transaction amount (last 12 months)
Total_Trans_Ct - Total transaction count (last 12 months)
Total_Ct_Chng_Q4_Q1 - Change in transaction count (Q4 over Q1)
Avg_Utilization_Ratio - Average card utilization ratio

# Read in the data and remove the last two columns irrelevant for the analysis
data = pd.read_csv("data/BankChurners.csv").iloc[: , :-2]

data.head()

	CLIENTNUM	Attrition_Flag	Customer_Age	Gender	Dependent_count	Education_Level	Marital_Status	Income_Category	Card_Category	Months_on_book	Total_Relationship_Count	Months_Inactive_12_mon	Contacts_Count_12_mon	Credit_Limit	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt	Total_Trans_Ct	Total_Ct_Chng_Q4_Q1	Avg_Utilization_Ratio
0	768805383	Existing Customer	45	M	3	High School	Married	$60K - $80K	Blue	39	5	1	3	12691.00	777	11914.00	1.33	1144	42	1.62	0.06
1	818770008	Existing Customer	49	F	5	Graduate	Single	Less than $40K	Blue	44	6	1	2	8256.00	864	7392.00	1.54	1291	33	3.71	0.10
2	713982108	Existing Customer	51	M	3	Graduate	Married	$80K - $120K	Blue	36	4	1	0	3418.00	0	3418.00	2.59	1887	20	2.33	0.00
3	769911858	Existing Customer	40	F	4	High School	Unknown	Less than $40K	Blue	34	3	4	1	3313.00	2517	796.00	1.41	1171	20	2.33	0.76
4	709106358	Existing Customer	40	M	3	Uneducated	Married	$60K - $80K	Blue	21	5	1	0	4716.00	0	4716.00	2.17	816	28	2.50	0.00

data.shape

(10127, 21)

We have a total of 10127 customers described with 21 attributes.

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   CLIENTNUM                 10127 non-null  int64  
 1   Attrition_Flag            10127 non-null  object 
 2   Customer_Age              10127 non-null  int64  
 3   Gender                    10127 non-null  object 
 4   Dependent_count           10127 non-null  int64  
 5   Education_Level           10127 non-null  object 
 6   Marital_Status            10127 non-null  object 
 7   Income_Category           10127 non-null  object 
 8   Card_Category             10127 non-null  object 
 9   Months_on_book            10127 non-null  int64  
 10  Total_Relationship_Count  10127 non-null  int64  
 11  Months_Inactive_12_mon    10127 non-null  int64  
 12  Contacts_Count_12_mon     10127 non-null  int64  
 13  Credit_Limit              10127 non-null  float64
 14  Total_Revolving_Bal       10127 non-null  int64  
 15  Avg_Open_To_Buy           10127 non-null  float64
 16  Total_Amt_Chng_Q4_Q1      10127 non-null  float64
 17  Total_Trans_Amt           10127 non-null  int64  
 18  Total_Trans_Ct            10127 non-null  int64  
 19  Total_Ct_Chng_Q4_Q1       10127 non-null  float64
 20  Avg_Utilization_Ratio     10127 non-null  float64
dtypes: float64(5), int64(10), object(6)
memory usage: 1.6+ MB

This dataset is actually very clean. We’ll only do some preprocessing steps before the analysis.

Data Preprocessing

Binning Customer Age

We’ll create specific age categories and place ages within a range in these categories. For instance, category 20s will contain ages between 20 and 29, and so on.

# looking for min and max age
print(data['Customer_Age'].min())
print(data['Customer_Age'].max())

26
73

bins = [20, 30, 40, 50, 60, 70, 80]
labels = ['20s', '30s', '40s', '50s', '60s', '70s']

data['Customer_Age_bins'] = pd.cut(
    data['Customer_Age'], 
    bins=bins, 
    labels=labels, 
    include_lowest=True, 
    right=False
)

Average Transaction Value

The average transaction value is the average amount a customer spends on a single purchase.

We can calculate it by dividing the Total_Trans_Amt by the Total_Trans_Ct.

data['Avg_Transaction_Value'] = data['Total_Trans_Amt'] / data['Total_Trans_Ct']

Converting data types for memory optimization

data.info(verbose=False, show_counts=False, memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Columns: 23 entries, CLIENTNUM to Avg_Transaction_Value
dtypes: category(1), float64(6), int64(10), object(6)
memory usage: 5.0 MB

Currently, the total deep memory usage of the DataFrame is 5 MB. We’ll reduce it with smaller data types.

data = data.astype(
    {   
        'Attrition_Flag': 'category',    # categories
        'Gender': 'category',           
        'Education_Level': 'category', 
        'Marital_Status': 'category', 
        'Income_Category': 'category', 
        'Card_Category': 'category',
        'Customer_Age': 'int8',          # integers
        'Dependent_count': 'int8',
        'Months_on_book': 'int8',
        'Total_Relationship_Count':'int8',
        'Months_Inactive_12_mon':'int8',
        'Contacts_Count_12_mon':'int8',
        'Total_Revolving_Bal':'int16',
        'Total_Trans_Amt':'int16',
        'Total_Trans_Ct':'uint8',             
        'Credit_Limit': 'float32',       # floats
        'Avg_Open_To_Buy': 'float32', 
        'Total_Amt_Chng_Q4_Q1': 'float16', 
        'Total_Ct_Chng_Q4_Q1': 'float16', 
        'Avg_Utilization_Ratio': 'float16',
        'Avg_Transaction_Value': 'float32'
    }
)

data.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype   
---  ------                    --------------  -----   
 0   CLIENTNUM                 10127 non-null  int64   
 1   Attrition_Flag            10127 non-null  category
 2   Customer_Age              10127 non-null  int8    
 3   Gender                    10127 non-null  category
 4   Dependent_count           10127 non-null  int8    
 5   Education_Level           10127 non-null  category
 6   Marital_Status            10127 non-null  category
 7   Income_Category           10127 non-null  category
 8   Card_Category             10127 non-null  category
 9   Months_on_book            10127 non-null  int8    
 10  Total_Relationship_Count  10127 non-null  int8    
 11  Months_Inactive_12_mon    10127 non-null  int8    
 12  Contacts_Count_12_mon     10127 non-null  int8    
 13  Credit_Limit              10127 non-null  float32 
 14  Total_Revolving_Bal       10127 non-null  int16   
 15  Avg_Open_To_Buy           10127 non-null  float32 
 16  Total_Amt_Chng_Q4_Q1      10127 non-null  float16 
 17  Total_Trans_Amt           10127 non-null  int16   
 18  Total_Trans_Ct            10127 non-null  uint8   
 19  Total_Ct_Chng_Q4_Q1       10127 non-null  float16 
 20  Avg_Utilization_Ratio     10127 non-null  float16 
 21  Customer_Age_bins         10127 non-null  category
 22  Avg_Transaction_Value     10127 non-null  float32 
dtypes: category(7), float16(3), float32(3), int16(2), int64(1), int8(6), uint8(1)
memory usage: 438.4 KB

We can see at the bottom that the memory usage is now only 438.4 KB (10 times lower).

Exploratory Data Analysis (EDA)

🔍 Percentage of churned customers

data['Attrition_Flag'].value_counts()

Existing Customer    8500
Attrited Customer    1627
Name: Attrition_Flag, dtype: int64

We’ll use normalize=True to get the share of churned and existing customers.

churned = data['Attrition_Flag'].value_counts(normalize=True)['Attrited Customer']

print(round(churned * 100 , 2), '% of customers have churned.')

16.07 % of customers have churned.

📝 Summary Statistics

data.describe()

	CLIENTNUM	Customer_Age	Dependent_count	Months_on_book	Total_Relationship_Count	Months_Inactive_12_mon	Contacts_Count_12_mon	Credit_Limit	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt	Total_Trans_Ct	Total_Ct_Chng_Q4_Q1	Avg_Utilization_Ratio	Avg_Transaction_Value
count	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00	10127.00
mean	739177606.33	46.33	2.35	35.93	3.81	2.34	2.46	8632.00	1162.81	7469.17	0.76	4404.09	64.86	0.71	0.27	62.61
std	36903783.45	8.02	1.30	7.99	1.55	1.01	1.11	9088.76	814.99	9090.68	0.22	3397.13	23.47	0.24	0.28	26.40
min	708082083.00	26.00	0.00	13.00	1.00	0.00	0.00	1438.30	0.00	3.00	0.00	510.00	10.00	0.00	0.00	19.14
25%	713036770.50	41.00	1.00	31.00	3.00	2.00	2.00	2555.00	359.00	1324.50	0.63	2155.50	45.00	0.58	0.02	47.51
50%	717926358.00	46.00	2.00	36.00	4.00	2.00	2.00	4549.00	1276.00	3474.00	0.74	3899.00	67.00	0.70	0.18	55.79
75%	773143533.00	52.00	3.00	40.00	5.00	3.00	3.00	11067.50	1784.00	9859.00	0.86	4741.00	81.00	0.82	0.50	65.48
max	828343083.00	73.00	5.00	56.00	6.00	6.00	6.00	34516.00	2517.00	34516.00	3.40	18484.00	139.00	3.71	1.00	190.19

Here we can see:

The longest customer in this dataset has been around for 56 months, or about 4 years and a half. (Max)
The average number of relationships a customer has is 4. (Mean and median agree here.)
The average credit limit is $8.6K, but the median credit limit is much lower at $4.5K. (signals some skew in this variable)

📊 Distributions

Histograms

Let’s take a look at the distribution for the Months_on_book variable.

alt.Chart(data).mark_bar().encode(   
    alt.X('Months_on_book', bin=alt.Bin(maxbins=30), title="Months on book"),
    alt.Y('count()')
).properties(width=500)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

It looks pretty normal outside of a very strong peak at ~36 months (roughly 2800 customers). It might be a data error (default months for those without a record), or there was a major marketing campaign that brought in a lot of users 36 months ago.

Now let’s take a look at the Credit_Limit.

hist = alt.Chart(data).mark_bar().encode(   
    alt.X('Credit_Limit', bin=alt.Bin(step=1500), title="Credit Limit"),
    alt.Y('count()')
).properties(width=500)

aggregates = alt.Chart(data).transform_aggregate(
    mean='mean(Credit_Limit)',
    median='median(Credit_Limit)',
).transform_fold(
    ['mean', 'median']
).mark_rule().encode(
    x='value:Q',
    color='key:N',
    size=alt.value(2)
)

hist + aggregates

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

It’s clear that the Credit_Limit is a skewed variable, being skewed higher (or to the right) by a few customers with high credit limits.

Boxplot

Let’s look at the total number of transactions by gender.

color_scale = alt.Scale(domain=['M', 'F'],
                        range=['#1f77b4', '#e377c2'])
                        
alt.Chart(data).mark_boxplot(size=25).encode(
    alt.X('Gender', axis=alt.Axis(labelAngle=-0)),
    alt.Y('Total_Trans_Ct'),
    alt.Color('Gender', scale=color_scale)
).properties(width=150)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

This plot quickly shows us that the median transaction count is slightly higher for females than males and outliers seem to be present among both groups.

Pyramid chart

Was there a difference in the distribution of customers by age and gender? A pyramid chart can help us find the answer.

To get the data in the format we need, we’ll aggregate the data up to the age and gender level.

pyramid_df = (data
    .groupby(['Gender', 'Customer_Age_bins'])
    ['CLIENTNUM']
    .nunique()
    .reset_index()
)

left = alt.Chart(pyramid_df[pyramid_df['Gender']=="F"]).mark_bar().encode(
    alt.Y('Customer_Age_bins', axis=None),
    alt.X('CLIENTNUM', title='Customers', sort='descending'),
    alt.Color('Gender', scale=color_scale, legend=None)
).properties(title='Female', width=350)

middle = alt.Chart(pyramid_df).mark_text().encode(
    alt.Y('Customer_Age_bins', axis=None),
    alt.Text('Customer_Age_bins'),
).properties(width=20)

right = alt.Chart(pyramid_df[pyramid_df['Gender']=="M"]).mark_bar().encode(
    alt.Y('Customer_Age_bins', axis=None),
    alt.X('CLIENTNUM', title='Customers'),
    alt.Color('Gender', scale=color_scale, legend=None)
).properties(title='Male', width=300)

left | middle | right

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

Here we see that there is no real difference in the distribution of customers by age and gender. The majority of customers are in their 40s. We have only two male customers in their 70s, which is hardly visible here.

📊 Categorical variables

Next, we’ll see how attrited and existing customers are represented in all category groups.

# get the list of categorical variables, remove `Attrition_Flag`
cat_list = sorted((list(set(data.columns) - set(data._get_numeric_data().columns))))[1:]
cat_list

['Card_Category',
 'Customer_Age_bins',
 'Education_Level',
 'Gender',
 'Income_Category',
 'Marital_Status']

def cat_column_bar(cat_column):

    return alt.Chart(data).mark_bar().encode(
        alt.X('count()', stack="normalize", axis=alt.Axis(format='%'), title='Percent'),
        alt.Y(cat_column),
        color=alt.Color('Attrition_Flag', sort="descending")
    ).properties(width=250)
    
(cat_column_bar(cat_list[0]) | 
 cat_column_bar(cat_list[1]) | 
 cat_column_bar(cat_list[2]) | 
 cat_column_bar(cat_list[3]) | 
 cat_column_bar(cat_list[4]) | 
 cat_column_bar(cat_list[5]) 
)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

Nothing strange here. Churned customers are almost equally distributed across all category groups.

Individual values for customers with a platinum credit card and the age group 70s deviate from the average due to the small number of customers in these groups.

📊 Comparing Categories

We will look at the average Credit_Limit by age group.

bar = alt.Chart(data).mark_bar().encode(
    alt.X('Customer_Age_bins', axis=alt.Axis(labelAngle=-0), title='Customer age group'),
    alt.Y('mean(Credit_Limit)'),
    alt.Color('Customer_Age_bins', legend=None)
).properties(width=300)

error_bars = alt.Chart(data).mark_errorbar(extent='ci').encode(  
    x='Customer_Age_bins',
    y='Credit_Limit',
    strokeWidth=alt.value(2)
)

bar + error_bars

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

And now we’ll look at the Credit_Limit by age group and gender.

grouped_bar = alt.Chart(data).mark_bar().encode(
    alt.X('Gender', axis=alt.Axis(title=None, labels=False, ticks=False)),
    alt.Y('mean(Credit_Limit)', title="Mean Credit Limit", axis=alt.Axis(grid=False)),
    alt.Color('Gender', scale=color_scale)
).properties(width=50)

error_bars = alt.Chart(data).mark_errorbar(extent='ci').encode(  
    x='Gender',
    y=alt.Y('Credit_Limit', title='Credit Limit'),
    strokeWidth=alt.value(2)
)

(grouped_bar + error_bars).facet(
    column=alt.Column('Customer_Age_bins', title='Customer age group')
).configure_headerColumn(
    titleOrient='bottom', 
    labelOrient='bottom'
).configure_view(
    stroke='transparent'
)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

Females on average are getting credit limits much lower than males! They probably have lower wages. Let’s check this out.

income_cat_order = ['Less than $40K', '$40K - $60K', '$60K - $80K', 
                    '$80K - $120K', '$120K +', 'Unknown']
                    
alt.Chart(data).mark_bar().encode(
    alt.X('count()'),
    alt.Y('Income_Category', sort=income_cat_order, title='Income Category'),
    alt.Color('Gender', scale=color_scale)
).properties(height=130)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

This is interesting. There is not a single woman in this dataset who earns $60K-$80K, $80K-$120K, or $120K + a year.

Does this follow their education level?

edu_cat_order = ['Uneducated', 'High School', 'College', 'Graduate', 
                 'Post-Graduate', 'Doctorate', 'Unknown']

alt.Chart(data).mark_bar().encode(
    alt.X('count()'),
    alt.Y('Education_Level', sort=edu_cat_order, title='Education Level'),
    alt.Color('Gender', scale=color_scale)
).properties(height=150)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

No. Men and women are equally educated. But compared to women in general, men are clearly more represented in better-paid jobs and in better-paid positions.

📝 Pivot table

Coming back to the business problem, we want to understand something about customers who are leaving. Let’s see if we can see anything obvious about their behavior.

(data
    .groupby(['Attrition_Flag'])
    .agg({'CLIENTNUM':'nunique',  # number of unique customers in each group
          'Customer_Age': 'mean',  # the rest are looking at the mean per group
          'Dependent_count': 'mean',
          'Months_on_book': 'mean',
          'Total_Relationship_Count': 'mean',
          'Months_Inactive_12_mon': 'mean',
          'Contacts_Count_12_mon': 'mean',
          'Credit_Limit': 'mean',
          'Total_Revolving_Bal': 'mean',
          'Avg_Open_To_Buy': 'mean',
          'Total_Amt_Chng_Q4_Q1': 'mean',
          'Total_Trans_Amt': 'mean',
          'Total_Trans_Ct': 'mean',
          'Total_Ct_Chng_Q4_Q1': 'mean',
          'Avg_Utilization_Ratio': 'mean',
          'Avg_Transaction_Value': 'mean',})
    .transpose()
    .assign(Diff = (lambda x: (x['Attrited Customer'] / x['Existing Customer'] - 1)))
    .sort_values('Diff')
    .rename_axis(None, axis=1)
)

	Attrited Customer	Existing Customer	Diff
CLIENTNUM	1627.00	8500.00	-0.81
Total_Revolving_Bal	672.82	1256.60	-0.46
Avg_Utilization_Ratio	0.16	0.30	-0.45
Total_Trans_Ct	44.93	68.67	-0.35
Total_Trans_Amt	3095.03	4654.66	-0.34
Total_Ct_Chng_Q4_Q1	0.55	0.74	-0.25
Total_Relationship_Count	3.28	3.91	-0.16
Total_Amt_Chng_Q4_Q1	0.69	0.77	-0.10
Credit_Limit	8136.04	8726.88	-0.07
Avg_Open_To_Buy	7463.22	7470.27	-0.00
Months_on_book	36.18	35.88	0.01
Customer_Age	46.66	46.26	0.01
Avg_Transaction_Value	63.59	62.43	0.02
Dependent_count	2.40	2.34	0.03
Months_Inactive_12_mon	2.69	2.27	0.18
Contacts_Count_12_mon	2.97	2.36	0.26

We can see that the Total_Revolving_Bal and Avg_Utilization_Ratio show the biggest differences (45% below existing customers), along with Total_Trans_Ct and Total_Trans_Amt (35% below existing customers), and Total_Ct_Chng_Q4_Q1 (25% below existing customers).

Average Card Utilization Ratio

It represents the amount of revolving credit customer is using divided by the total credit available to them.

Let’s look at the Avg_Utilization_Ratio distribution.

alt.Chart(data).mark_bar(opacity=0.7).encode(
    alt.X('Avg_Utilization_Ratio', bin=alt.Bin(step=0.025), 
          axis=alt.Axis(format='%'), title="Average Utilization Ratio"),
    alt.Y('count()', stack=False),
    alt.Color('Attrition_Flag', sort='descending', title='Attrition Flag')
).properties(width=700)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

We can see that 57% (934) of churned customers have the average utilization ratio less than 2.5%.

How many churned customers have a credit card utilization ratio of zero?

data.query('(Avg_Utilization_Ratio == 0) and (Attrition_Flag=="Attrited Customer")').shape[0]

There are 893 (55%) churned customers whose credit card utilization ratio is zero, which means they pay off their balances in full before the statement due date.

📊 Transaction Amounts and Counts relationship

Scatter plot with Transaction Amounts and Counts looks interesting.

alt.Chart(data).mark_circle(stroke="white", strokeWidth=0.4).encode(
    alt.X('Total_Trans_Amt', title='Total Transaction Amount'),
    alt.Y('Total_Trans_Ct', title='Total Transaction Count'),
) #.properties(width=500, height=300)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

This plot clearly shows three distinct groups. Will anything show up if we add Attrition_Flag as a color? We’ll also add histograms for transaction amounts and counts.

scatter = alt.Chart(data).mark_circle(stroke="white", strokeWidth=0.4).encode(
    alt.X('Total_Trans_Amt', title='Total Transaction Amount'),
    alt.Y('Total_Trans_Ct', title='Total Transaction Count'),
    alt.Color('Attrition_Flag', sort='descending', title='Attrition Flag')
).properties(width=625, height=400)

hist_amt = alt.Chart(data).mark_bar(opacity=0.7).encode(
    alt.X('Total_Trans_Amt', bin=alt.Bin(step=500), title='', 
          scale=alt.Scale(domain=[0, 20000])),
    alt.Y('count()', stack=False, title=''),
    alt.Color('Attrition_Flag', sort='descending')
).properties(width=625, height=150)

hist_ct = alt.Chart(data).mark_bar(opacity=0.7).encode(
    alt.X('count()', stack=False, title=''), 
    alt.Y('Total_Trans_Ct', bin=alt.Bin(maxbins=30), title='',
          scale=alt.Scale(domain=[0, 140])),
    alt.Color('Attrition_Flag', sort='descending')
).properties(width=100, height=400)

# putting them all together!

plot_title = alt.TitleParams(
    "Transaction Amounts and Counts for Customers",
    subtitle = "No Churned Customers above $11K of Spend"
)

(hist_amt & (scatter | hist_ct)).properties(title=plot_title
).configure_title(fontSize=16, dy=-10
).configure_legend(
    offset=-120,
    strokeColor='gray',
    padding=10,
    cornerRadius=10,
)

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

We can see that there are no customers who have churned that have spent $11K or more,

We can also see churned customers with transaction amounts between $4K and $11K who have separated from existing customers.

We’ll create bins and separate these groups from each other so we can effectively look at their differences.

data['Total_Trans_Amt_bin'] = pd.cut(
    data['Total_Trans_Amt'],
    bins=[0, 4000, 11000, 20000],
    labels=['Bellow $4K','$4K-$11K', 'Above $11K'],
    include_lowest=True,
    right=False)
    
data.groupby(['Total_Trans_Amt_bin', 'Attrition_Flag']).agg({
     'CLIENTNUM':'nunique',
    'Customer_Age': 'mean',
    'Dependent_count': 'mean',
    'Months_on_book': 'mean',
    'Total_Relationship_Count': 'mean',
    'Months_Inactive_12_mon': 'mean',
    'Contacts_Count_12_mon': 'mean',
    'Credit_Limit': 'mean',
    'Total_Revolving_Bal': 'mean',
    'Avg_Open_To_Buy': 'mean',
    'Total_Amt_Chng_Q4_Q1': 'mean',
    'Total_Trans_Amt': 'mean',
    'Total_Trans_Ct': 'mean',
    'Total_Ct_Chng_Q4_Q1': 'mean',
    'Avg_Utilization_Ratio': 'mean',
    'Avg_Transaction_Value': 'mean'
}).T

Total_Trans_Amt_bin	Bellow $4K		$4K-$11K		Above $11K
Attrition_Flag	Attrited Customer	Existing Customer	Attrited Customer	Existing Customer	Attrited Customer	Existing Customer
CLIENTNUM	1318.00	3969.00	309.00	3784.00	0.00	747.00
Customer_Age	46.92	45.96	45.54	46.81	NaN	45.10
Dependent_count	2.41	2.22	2.37	2.47	NaN	2.27
Months_on_book	36.44	35.68	35.06	36.27	NaN	35.01
Total_Relationship_Count	3.34	4.43	3.01	3.69	NaN	2.36
Months_Inactive_12_mon	2.70	2.25	2.64	2.31	NaN	2.22
Contacts_Count_12_mon	3.00	2.53	2.86	2.20	NaN	2.22
Credit_Limit	6709.90	9528.69	14219.05	6834.27	NaN	14053.80
Total_Revolving_Bal	659.87	1299.79	728.06	1189.39	NaN	1367.62
Avg_Open_To_Buy	6050.03	8228.90	13490.99	5644.88	NaN	12686.18
Total_Amt_Chng_Q4_Q1	0.66	0.80	0.86	0.75	NaN	0.78
Total_Trans_Amt	2071.31	2443.86	7461.54	4990.81	NaN	14698.40
Total_Trans_Ct	39.81	51.49	66.79	78.47	NaN	110.34
Total_Ct_Chng_Q4_Q1	0.51	0.74	0.74	0.74	NaN	0.75
Avg_Utilization_Ratio	0.18	0.28	0.09	0.33	NaN	0.18
Avg_Transaction_Value	52.20	47.66	112.19	63.73	NaN	134.28

In terms of average credit limit, churned customers with transaction amounts between $4K and $11K are more like existing customers with transaction amounts above $11K (very similar average credit limit, around $14K). Simply put, they spent less. And as a result, the translation of those points (churned customers) to the left and downward may be seen on the previous graph.

📊 Can we influence the Q4 to Q1 dip?

The Q4-Q1 change (Total_Ct_Chng_Q4_Q1) is significantly different for churned customers too! This indicates that churned customers are spending significantly less (-25% lower than their counterparts) after the holiday season, pulling back much more sigificantly. If we can impact this variable, we can also directly impact the Total Transaction Count!

alt.Chart(data).transform_density(
     'Total_Ct_Chng_Q4_Q1',
     groupby=['Attrition_Flag'],
     as_=['Total_Ct_Chng_Q4_Q1', 'density']
).mark_area(opacity=0.7, clip=True).encode(                
     alt.X('Total_Ct_Chng_Q4_Q1', scale=alt.Scale(domain=[0, 2]),
           title='Total Transaction Count Change (Q4 to Q1)'),   
     alt.Y('density:Q', scale=alt.Scale(domain=[0, 3])),  
     alt.Color('Attrition_Flag', sort='descending')
).properties(title='Change in Transaction Count (Q4 over Q1)', width=500, height=300
).configure_title(fontSize=16, dy=-10, anchor='start')

Save as SVG Save as PNG View Source View Compiled Vega Open in Vega Editor

Recommendations

Promotions during Q1 time to keep the spending levels high. The more we spend the more attached we feel to the card, it’s benefits, etc. A “prevent the cliff” campaign where all customers who have historically shown a strong drop off, will get targeted with this promo.
Customer surveys. For those who are spending above $11,000 to understand why they love our card and what keeps them around. If we were also able to get responses from some of our churn customers, we can develop stronger marketing campaigns around our findings.
Offer loyalty points, cash back, etc.
Look at any historical marketing campaigns to see what we can learn from what worked / didn’t work.

Thanks for reading!