Bank Customer Retention Analysis in Python

What marketing campaigns could help reduce customer churn?
python
altair
EDA
Author

Sandra Jurela

Published

June 10, 2023

Introduction

In this hypothetical case study I’ve been hired as a financial analyst by the marketing department within a bank.

They are faced with more and more customers leaving their credit card services. They would like to understand what characteristics lend themselves to someone who is going to churn so they can proactively go to the customer to provide them better services and turn customers’ decisions in the opposite direction.

As a part of my first analysis, they’ve asked me to take a look at the available data and help them understand how to increase customer retention.

Problem Statement

What marketing campaigns could we implement to help reduce customer churn?

Imports

import pandas as pd
import altair as alt

# Show all columns (instead of cascading columns in the middle)
pd.set_option("display.max_columns", None)
# Don't show numbers in scientific notation
pd.set_option("display.float_format", "{:.2f}".format)

# Enable plotting more than 5000 points
alt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')

I will use Altair for data visualizations. It’s a declarative statistical visualization library for Python, built on top of the powerful Vega-Lite grammar, It’s very intuitive to use, and after some experience with Matplotlib and Seaborn, I’m pleasantly surprised with it. 😇

Data Overview

The dataset consists of about 10,000 customers described with their age, income level, marital status, credit card limit, credit card category, etc. It’s available on Kaggle at the following link.

Data Dictionary

  • CLIENTNUM - Unique identifier for the customer holding the account
  • Attrition_Flag - If the account is closed then Attrited Customer, else Existing Customer
  • Customer_Age - Customer’s age in years
  • Gender - Male = M, Female = F
  • Dependent_count - Number of dependents
  • Education_Level - Educational qualification of the account holder (example: high school, college graduate, etc.)
  • Marital_Status - Married, Single, Divorced, Unknown
  • Income_Category - Annual income category of the account holder (Less than $40K, $40K-$60K, $60K-$80K, $80K-$120K, $120K +
  • Card_Category - Type of card (Blue, Silver, Gold, Platinum)
  • Months_on_book - Period of relationship with bank
  • Total_Relationship_count - Total number of products held by the customer
  • Months_Inactive_12_mon - Number of months inactive in the last 12 months
  • Contacts_Count_12_mon - Number of contacts in the last 12 months
  • Credit_Limit - Credit limit on the credit card
  • Total_Revolving_Bal - Total revolving balance on the credit card
  • Avg_Open_To_Buy- Open to buy credit line (average of last 12 months)
  • Total_Amt_Chng_Q4_Q1 - Change in transaction amount (Q4 over Q1)
  • Total_Trans_Amt - Total transaction amount (last 12 months)
  • Total_Trans_Ct - Total transaction count (last 12 months)
  • Total_Ct_Chng_Q4_Q1 - Change in transaction count (Q4 over Q1)
  • Avg_Utilization_Ratio - Average card utilization ratio
# Read in the data and remove the last two columns irrelevant for the analysis
data = pd.read_csv("data/BankChurners.csv").iloc[: , :-2]

data.head()
CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count Education_Level Marital_Status Income_Category Card_Category Months_on_book Total_Relationship_Count Months_Inactive_12_mon Contacts_Count_12_mon Credit_Limit Total_Revolving_Bal Avg_Open_To_Buy Total_Amt_Chng_Q4_Q1 Total_Trans_Amt Total_Trans_Ct Total_Ct_Chng_Q4_Q1 Avg_Utilization_Ratio
0 768805383 Existing Customer 45 M 3 High School Married $60K - $80K Blue 39 5 1 3 12691.00 777 11914.00 1.33 1144 42 1.62 0.06
1 818770008 Existing Customer 49 F 5 Graduate Single Less than $40K Blue 44 6 1 2 8256.00 864 7392.00 1.54 1291 33 3.71 0.10
2 713982108 Existing Customer 51 M 3 Graduate Married $80K - $120K Blue 36 4 1 0 3418.00 0 3418.00 2.59 1887 20 2.33 0.00
3 769911858 Existing Customer 40 F 4 High School Unknown Less than $40K Blue 34 3 4 1 3313.00 2517 796.00 1.41 1171 20 2.33 0.76
4 709106358 Existing Customer 40 M 3 Uneducated Married $60K - $80K Blue 21 5 1 0 4716.00 0 4716.00 2.17 816 28 2.50 0.00
data.shape
(10127, 21)

We have a total of 10127 customers described with 21 attributes.

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   CLIENTNUM                 10127 non-null  int64  
 1   Attrition_Flag            10127 non-null  object 
 2   Customer_Age              10127 non-null  int64  
 3   Gender                    10127 non-null  object 
 4   Dependent_count           10127 non-null  int64  
 5   Education_Level           10127 non-null  object 
 6   Marital_Status            10127 non-null  object 
 7   Income_Category           10127 non-null  object 
 8   Card_Category             10127 non-null  object 
 9   Months_on_book            10127 non-null  int64  
 10  Total_Relationship_Count  10127 non-null  int64  
 11  Months_Inactive_12_mon    10127 non-null  int64  
 12  Contacts_Count_12_mon     10127 non-null  int64  
 13  Credit_Limit              10127 non-null  float64
 14  Total_Revolving_Bal       10127 non-null  int64  
 15  Avg_Open_To_Buy           10127 non-null  float64
 16  Total_Amt_Chng_Q4_Q1      10127 non-null  float64
 17  Total_Trans_Amt           10127 non-null  int64  
 18  Total_Trans_Ct            10127 non-null  int64  
 19  Total_Ct_Chng_Q4_Q1       10127 non-null  float64
 20  Avg_Utilization_Ratio     10127 non-null  float64
dtypes: float64(5), int64(10), object(6)
memory usage: 1.6+ MB

This dataset is actually very clean. We’ll only do some preprocessing steps before the analysis.

Data Preprocessing

Binning Customer Age

We’ll create specific age categories and place ages within a range in these categories. For instance, category 20s will contain ages between 20 and 29, and so on.

# looking for min and max age
print(data['Customer_Age'].min())
print(data['Customer_Age'].max())
26
73
bins = [20, 30, 40, 50, 60, 70, 80]
labels = ['20s', '30s', '40s', '50s', '60s', '70s']

data['Customer_Age_bins'] = pd.cut(
    data['Customer_Age'], 
    bins=bins, 
    labels=labels, 
    include_lowest=True, 
    right=False
)

Average Transaction Value

The average transaction value is the average amount a customer spends on a single purchase.

We can calculate it by dividing the Total_Trans_Amt by the Total_Trans_Ct.

data['Avg_Transaction_Value'] = data['Total_Trans_Amt'] / data['Total_Trans_Ct']

Converting data types for memory optimization

data.info(verbose=False, show_counts=False, memory_usage='deep') 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Columns: 23 entries, CLIENTNUM to Avg_Transaction_Value
dtypes: category(1), float64(6), int64(10), object(6)
memory usage: 5.0 MB

Currently, the total deep memory usage of the DataFrame is 5 MB. We’ll reduce it with smaller data types.

data = data.astype(
    {   
        'Attrition_Flag': 'category',    # categories
        'Gender': 'category',           
        'Education_Level': 'category', 
        'Marital_Status': 'category', 
        'Income_Category': 'category', 
        'Card_Category': 'category',
        'Customer_Age': 'int8',          # integers
        'Dependent_count': 'int8',
        'Months_on_book': 'int8',
        'Total_Relationship_Count':'int8',
        'Months_Inactive_12_mon':'int8',
        'Contacts_Count_12_mon':'int8',
        'Total_Revolving_Bal':'int16',
        'Total_Trans_Amt':'int16',
        'Total_Trans_Ct':'uint8',             
        'Credit_Limit': 'float32',       # floats
        'Avg_Open_To_Buy': 'float32', 
        'Total_Amt_Chng_Q4_Q1': 'float16', 
        'Total_Ct_Chng_Q4_Q1': 'float16', 
        'Avg_Utilization_Ratio': 'float16',
        'Avg_Transaction_Value': 'float32'
    }
)
data.info(memory_usage='deep') 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype   
---  ------                    --------------  -----   
 0   CLIENTNUM                 10127 non-null  int64   
 1   Attrition_Flag            10127 non-null  category
 2   Customer_Age              10127 non-null  int8    
 3   Gender                    10127 non-null  category
 4   Dependent_count           10127 non-null  int8    
 5   Education_Level           10127 non-null  category
 6   Marital_Status            10127 non-null  category
 7   Income_Category           10127 non-null  category
 8   Card_Category             10127 non-null  category
 9   Months_on_book            10127 non-null  int8    
 10  Total_Relationship_Count  10127 non-null  int8    
 11  Months_Inactive_12_mon    10127 non-null  int8    
 12  Contacts_Count_12_mon     10127 non-null  int8    
 13  Credit_Limit              10127 non-null  float32 
 14  Total_Revolving_Bal       10127 non-null  int16   
 15  Avg_Open_To_Buy           10127 non-null  float32 
 16  Total_Amt_Chng_Q4_Q1      10127 non-null  float16 
 17  Total_Trans_Amt           10127 non-null  int16   
 18  Total_Trans_Ct            10127 non-null  uint8   
 19  Total_Ct_Chng_Q4_Q1       10127 non-null  float16 
 20  Avg_Utilization_Ratio     10127 non-null  float16 
 21  Customer_Age_bins         10127 non-null  category
 22  Avg_Transaction_Value     10127 non-null  float32 
dtypes: category(7), float16(3), float32(3), int16(2), int64(1), int8(6), uint8(1)
memory usage: 438.4 KB

We can see at the bottom that the memory usage is now only 438.4 KB (10 times lower).

Exploratory Data Analysis (EDA)

🔍 Percentage of churned customers

data['Attrition_Flag'].value_counts()   
Existing Customer    8500
Attrited Customer    1627
Name: Attrition_Flag, dtype: int64

We’ll use normalize=True to get the share of churned and existing customers.

churned = data['Attrition_Flag'].value_counts(normalize=True)['Attrited Customer']

print(round(churned * 100 , 2), '% of customers have churned.')
16.07 % of customers have churned.

📝 Summary Statistics

data.describe()
CLIENTNUM Customer_Age Dependent_count Months_on_book Total_Relationship_Count Months_Inactive_12_mon Contacts_Count_12_mon Credit_Limit Total_Revolving_Bal Avg_Open_To_Buy Total_Amt_Chng_Q4_Q1 Total_Trans_Amt Total_Trans_Ct Total_Ct_Chng_Q4_Q1 Avg_Utilization_Ratio Avg_Transaction_Value
count 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00 10127.00
mean 739177606.33 46.33 2.35 35.93 3.81 2.34 2.46 8632.00 1162.81 7469.17 0.76 4404.09 64.86 0.71 0.27 62.61
std 36903783.45 8.02 1.30 7.99 1.55 1.01 1.11 9088.76 814.99 9090.68 0.22 3397.13 23.47 0.24 0.28 26.40
min 708082083.00 26.00 0.00 13.00 1.00 0.00 0.00 1438.30 0.00 3.00 0.00 510.00 10.00 0.00 0.00 19.14
25% 713036770.50 41.00 1.00 31.00 3.00 2.00 2.00 2555.00 359.00 1324.50 0.63 2155.50 45.00 0.58 0.02 47.51
50% 717926358.00 46.00 2.00 36.00 4.00 2.00 2.00 4549.00 1276.00 3474.00 0.74 3899.00 67.00 0.70 0.18 55.79
75% 773143533.00 52.00 3.00 40.00 5.00 3.00 3.00 11067.50 1784.00 9859.00 0.86 4741.00 81.00 0.82 0.50 65.48
max 828343083.00 73.00 5.00 56.00 6.00 6.00 6.00 34516.00 2517.00 34516.00 3.40 18484.00 139.00 3.71 1.00 190.19


Here we can see:

  1. The longest customer in this dataset has been around for 56 months, or about 4 years and a half. (Max)

  2. The average number of relationships a customer has is 4. (Mean and median agree here.)

  3. The average credit limit is $8.6K, but the median credit limit is much lower at $4.5K. (signals some skew in this variable)

📊 Distributions

Histograms

Let’s take a look at the distribution for the Months_on_book variable.

alt.Chart(data).mark_bar().encode(   
    alt.X('Months_on_book', bin=alt.Bin(maxbins=30), title="Months on book"),
    alt.Y('count()')
).properties(width=500)

It looks pretty normal outside of a very strong peak at ~36 months (roughly 2800 customers). It might be a data error (default months for those without a record), or there was a major marketing campaign that brought in a lot of users 36 months ago.

Now let’s take a look at the Credit_Limit.

hist = alt.Chart(data).mark_bar().encode(   
    alt.X('Credit_Limit', bin=alt.Bin(step=1500), title="Credit Limit"),
    alt.Y('count()')
).properties(width=500)

aggregates = alt.Chart(data).transform_aggregate(
    mean='mean(Credit_Limit)',
    median='median(Credit_Limit)',
).transform_fold(
    ['mean', 'median']
).mark_rule().encode(
    x='value:Q',
    color='key:N',
    size=alt.value(2)
)

hist + aggregates

It’s clear that the Credit_Limit is a skewed variable, being skewed higher (or to the right) by a few customers with high credit limits.

Boxplot

Let’s look at the total number of transactions by gender.

color_scale = alt.Scale(domain=['M', 'F'],
                        range=['#1f77b4', '#e377c2'])
                        
alt.Chart(data).mark_boxplot(size=25).encode(
    alt.X('Gender', axis=alt.Axis(labelAngle=-0)),
    alt.Y('Total_Trans_Ct'),
    alt.Color('Gender', scale=color_scale)
).properties(width=150)

This plot quickly shows us that the median transaction count is slightly higher for females than males and outliers seem to be present among both groups.

Pyramid chart

Was there a difference in the distribution of customers by age and gender? A pyramid chart can help us find the answer.

To get the data in the format we need, we’ll aggregate the data up to the age and gender level.

pyramid_df = (data
    .groupby(['Gender', 'Customer_Age_bins'])
    ['CLIENTNUM']
    .nunique()
    .reset_index()
)

left = alt.Chart(pyramid_df[pyramid_df['Gender']=="F"]).mark_bar().encode(
    alt.Y('Customer_Age_bins', axis=None),
    alt.X('CLIENTNUM', title='Customers', sort='descending'),
    alt.Color('Gender', scale=color_scale, legend=None)
).properties(title='Female', width=350)

middle = alt.Chart(pyramid_df).mark_text().encode(
    alt.Y('Customer_Age_bins', axis=None),
    alt.Text('Customer_Age_bins'),
).properties(width=20)

right = alt.Chart(pyramid_df[pyramid_df['Gender']=="M"]).mark_bar().encode(
    alt.Y('Customer_Age_bins', axis=None),
    alt.X('CLIENTNUM', title='Customers'),
    alt.Color('Gender', scale=color_scale, legend=None)
).properties(title='Male', width=300)

left | middle | right

Here we see that there is no real difference in the distribution of customers by age and gender. The majority of customers are in their 40s. We have only two male customers in their 70s, which is hardly visible here.

📊 Categorical variables

Next, we’ll see how attrited and existing customers are represented in all category groups.

# get the list of categorical variables, remove `Attrition_Flag`
cat_list = sorted((list(set(data.columns) - set(data._get_numeric_data().columns))))[1:]
cat_list
['Card_Category',
 'Customer_Age_bins',
 'Education_Level',
 'Gender',
 'Income_Category',
 'Marital_Status']
def cat_column_bar(cat_column):

    return alt.Chart(data).mark_bar().encode(
        alt.X('count()', stack="normalize", axis=alt.Axis(format='%'), title='Percent'),
        alt.Y(cat_column),
        color=alt.Color('Attrition_Flag', sort="descending")
    ).properties(width=250)
    
(cat_column_bar(cat_list[0]) | 
 cat_column_bar(cat_list[1]) | 
 cat_column_bar(cat_list[2]) | 
 cat_column_bar(cat_list[3]) | 
 cat_column_bar(cat_list[4]) | 
 cat_column_bar(cat_list[5]) 
)


Nothing strange here. Churned customers are almost equally distributed across all category groups.

Individual values for customers with a platinum credit card and the age group 70s deviate from the average due to the small number of customers in these groups.

📊 Comparing Categories

We will look at the average Credit_Limit by age group.

bar = alt.Chart(data).mark_bar().encode(
    alt.X('Customer_Age_bins', axis=alt.Axis(labelAngle=-0), title='Customer age group'),
    alt.Y('mean(Credit_Limit)'),
    alt.Color('Customer_Age_bins', legend=None)
).properties(width=300)

error_bars = alt.Chart(data).mark_errorbar(extent='ci').encode(  
    x='Customer_Age_bins',
    y='Credit_Limit',
    strokeWidth=alt.value(2)
)

bar + error_bars

And now we’ll look at the Credit_Limit by age group and gender.

grouped_bar = alt.Chart(data).mark_bar().encode(
    alt.X('Gender', axis=alt.Axis(title=None, labels=False, ticks=False)),
    alt.Y('mean(Credit_Limit)', title="Mean Credit Limit", axis=alt.Axis(grid=False)),
    alt.Color('Gender', scale=color_scale)
).properties(width=50)

error_bars = alt.Chart(data).mark_errorbar(extent='ci').encode(  
    x='Gender',
    y=alt.Y('Credit_Limit', title='Credit Limit'),
    strokeWidth=alt.value(2)
)

(grouped_bar + error_bars).facet(
    column=alt.Column('Customer_Age_bins', title='Customer age group')
).configure_headerColumn(
    titleOrient='bottom', 
    labelOrient='bottom'
).configure_view(
    stroke='transparent'
)

Females on average are getting credit limits much lower than males! They probably have lower wages. Let’s check this out.

income_cat_order = ['Less than $40K', '$40K - $60K', '$60K - $80K', 
                    '$80K - $120K', '$120K +', 'Unknown']
                    
alt.Chart(data).mark_bar().encode(
    alt.X('count()'),
    alt.Y('Income_Category', sort=income_cat_order, title='Income Category'),
    alt.Color('Gender', scale=color_scale)
).properties(height=130)

This is interesting. There is not a single woman in this dataset who earns $60K-$80K, $80K-$120K, or $120K + a year.

Does this follow their education level?

edu_cat_order = ['Uneducated', 'High School', 'College', 'Graduate', 
                 'Post-Graduate', 'Doctorate', 'Unknown']

alt.Chart(data).mark_bar().encode(
    alt.X('count()'),
    alt.Y('Education_Level', sort=edu_cat_order, title='Education Level'),
    alt.Color('Gender', scale=color_scale)
).properties(height=150)

No. Men and women are equally educated. But compared to women in general, men are clearly more represented in better-paid jobs and in better-paid positions.

📝 Pivot table

Coming back to the business problem, we want to understand something about customers who are leaving. Let’s see if we can see anything obvious about their behavior.

(data
    .groupby(['Attrition_Flag'])
    .agg({'CLIENTNUM':'nunique',  # number of unique customers in each group
          'Customer_Age': 'mean',  # the rest are looking at the mean per group
          'Dependent_count': 'mean',
          'Months_on_book': 'mean',
          'Total_Relationship_Count': 'mean',
          'Months_Inactive_12_mon': 'mean',
          'Contacts_Count_12_mon': 'mean',
          'Credit_Limit': 'mean',
          'Total_Revolving_Bal': 'mean',
          'Avg_Open_To_Buy': 'mean',
          'Total_Amt_Chng_Q4_Q1': 'mean',
          'Total_Trans_Amt': 'mean',
          'Total_Trans_Ct': 'mean',
          'Total_Ct_Chng_Q4_Q1': 'mean',
          'Avg_Utilization_Ratio': 'mean',
          'Avg_Transaction_Value': 'mean',})
    .transpose()
    .assign(Diff = (lambda x: (x['Attrited Customer'] / x['Existing Customer'] - 1)))
    .sort_values('Diff')
    .rename_axis(None, axis=1)
)
Attrited Customer Existing Customer Diff
CLIENTNUM 1627.00 8500.00 -0.81
Total_Revolving_Bal 672.82 1256.60 -0.46
Avg_Utilization_Ratio 0.16 0.30 -0.45
Total_Trans_Ct 44.93 68.67 -0.35
Total_Trans_Amt 3095.03 4654.66 -0.34
Total_Ct_Chng_Q4_Q1 0.55 0.74 -0.25
Total_Relationship_Count 3.28 3.91 -0.16
Total_Amt_Chng_Q4_Q1 0.69 0.77 -0.10
Credit_Limit 8136.04 8726.88 -0.07
Avg_Open_To_Buy 7463.22 7470.27 -0.00
Months_on_book 36.18 35.88 0.01
Customer_Age 46.66 46.26 0.01
Avg_Transaction_Value 63.59 62.43 0.02
Dependent_count 2.40 2.34 0.03
Months_Inactive_12_mon 2.69 2.27 0.18
Contacts_Count_12_mon 2.97 2.36 0.26

We can see that the Total_Revolving_Bal and Avg_Utilization_Ratio show the biggest differences (45% below existing customers), along with Total_Trans_Ct and Total_Trans_Amt (35% below existing customers), and Total_Ct_Chng_Q4_Q1 (25% below existing customers).

Average Card Utilization Ratio

It represents the amount of revolving credit customer is using divided by the total credit available to them.

Let’s look at the Avg_Utilization_Ratio distribution.

alt.Chart(data).mark_bar(opacity=0.7).encode(
    alt.X('Avg_Utilization_Ratio', bin=alt.Bin(step=0.025), 
          axis=alt.Axis(format='%'), title="Average Utilization Ratio"),
    alt.Y('count()', stack=False),
    alt.Color('Attrition_Flag', sort='descending', title='Attrition Flag')
).properties(width=700)

We can see that 57% (934) of churned customers have the average utilization ratio less than 2.5%.

How many churned customers have a credit card utilization ratio of zero?

data.query('(Avg_Utilization_Ratio == 0) and (Attrition_Flag=="Attrited Customer")').shape[0]
893

There are 893 (55%) churned customers whose credit card utilization ratio is zero, which means they pay off their balances in full before the statement due date.

📊 Transaction Amounts and Counts relationship

Scatter plot with Transaction Amounts and Counts looks interesting.

alt.Chart(data).mark_circle(stroke="white", strokeWidth=0.4).encode(
    alt.X('Total_Trans_Amt', title='Total Transaction Amount'),
    alt.Y('Total_Trans_Ct', title='Total Transaction Count'),
) #.properties(width=500, height=300)

This plot clearly shows three distinct groups. Will anything show up if we add Attrition_Flag as a color? We’ll also add histograms for transaction amounts and counts.

scatter = alt.Chart(data).mark_circle(stroke="white", strokeWidth=0.4).encode(
    alt.X('Total_Trans_Amt', title='Total Transaction Amount'),
    alt.Y('Total_Trans_Ct', title='Total Transaction Count'),
    alt.Color('Attrition_Flag', sort='descending', title='Attrition Flag')
).properties(width=625, height=400)

hist_amt = alt.Chart(data).mark_bar(opacity=0.7).encode(
    alt.X('Total_Trans_Amt', bin=alt.Bin(step=500), title='', 
          scale=alt.Scale(domain=[0, 20000])),
    alt.Y('count()', stack=False, title=''),
    alt.Color('Attrition_Flag', sort='descending')
).properties(width=625, height=150)

hist_ct = alt.Chart(data).mark_bar(opacity=0.7).encode(
    alt.X('count()', stack=False, title=''), 
    alt.Y('Total_Trans_Ct', bin=alt.Bin(maxbins=30), title='',
          scale=alt.Scale(domain=[0, 140])),
    alt.Color('Attrition_Flag', sort='descending')
).properties(width=100, height=400)

# putting them all together!

plot_title = alt.TitleParams(
    "Transaction Amounts and Counts for Customers",
    subtitle = "No Churned Customers above $11K of Spend"
)

(hist_amt & (scatter | hist_ct)).properties(title=plot_title
).configure_title(fontSize=16, dy=-10
).configure_legend(
    offset=-120,
    strokeColor='gray',
    padding=10,
    cornerRadius=10,
)

We can see that there are no customers who have churned that have spent $11K or more,

We can also see churned customers with transaction amounts between $4K and $11K who have separated from existing customers.

We’ll create bins and separate these groups from each other so we can effectively look at their differences.

data['Total_Trans_Amt_bin'] = pd.cut(
    data['Total_Trans_Amt'],
    bins=[0, 4000, 11000, 20000],
    labels=['Bellow $4K','$4K-$11K', 'Above $11K'],
    include_lowest=True,
    right=False)
    
data.groupby(['Total_Trans_Amt_bin', 'Attrition_Flag']).agg({
     'CLIENTNUM':'nunique',
    'Customer_Age': 'mean',
    'Dependent_count': 'mean',
    'Months_on_book': 'mean',
    'Total_Relationship_Count': 'mean',
    'Months_Inactive_12_mon': 'mean',
    'Contacts_Count_12_mon': 'mean',
    'Credit_Limit': 'mean',
    'Total_Revolving_Bal': 'mean',
    'Avg_Open_To_Buy': 'mean',
    'Total_Amt_Chng_Q4_Q1': 'mean',
    'Total_Trans_Amt': 'mean',
    'Total_Trans_Ct': 'mean',
    'Total_Ct_Chng_Q4_Q1': 'mean',
    'Avg_Utilization_Ratio': 'mean',
    'Avg_Transaction_Value': 'mean'
}).T
Total_Trans_Amt_bin Bellow $4K $4K-$11K Above $11K
Attrition_Flag Attrited Customer Existing Customer Attrited Customer Existing Customer Attrited Customer Existing Customer
CLIENTNUM 1318.00 3969.00 309.00 3784.00 0.00 747.00
Customer_Age 46.92 45.96 45.54 46.81 NaN 45.10
Dependent_count 2.41 2.22 2.37 2.47 NaN 2.27
Months_on_book 36.44 35.68 35.06 36.27 NaN 35.01
Total_Relationship_Count 3.34 4.43 3.01 3.69 NaN 2.36
Months_Inactive_12_mon 2.70 2.25 2.64 2.31 NaN 2.22
Contacts_Count_12_mon 3.00 2.53 2.86 2.20 NaN 2.22
Credit_Limit 6709.90 9528.69 14219.05 6834.27 NaN 14053.80
Total_Revolving_Bal 659.87 1299.79 728.06 1189.39 NaN 1367.62
Avg_Open_To_Buy 6050.03 8228.90 13490.99 5644.88 NaN 12686.18
Total_Amt_Chng_Q4_Q1 0.66 0.80 0.86 0.75 NaN 0.78
Total_Trans_Amt 2071.31 2443.86 7461.54 4990.81 NaN 14698.40
Total_Trans_Ct 39.81 51.49 66.79 78.47 NaN 110.34
Total_Ct_Chng_Q4_Q1 0.51 0.74 0.74 0.74 NaN 0.75
Avg_Utilization_Ratio 0.18 0.28 0.09 0.33 NaN 0.18
Avg_Transaction_Value 52.20 47.66 112.19 63.73 NaN 134.28

In terms of average credit limit, churned customers with transaction amounts between $4K and $11K are more like existing customers with transaction amounts above $11K (very similar average credit limit, around $14K). Simply put, they spent less. And as a result, the translation of those points (churned customers) to the left and downward may be seen on the previous graph.

📊 Can we influence the Q4 to Q1 dip?

The Q4-Q1 change (Total_Ct_Chng_Q4_Q1) is significantly different for churned customers too! This indicates that churned customers are spending significantly less (-25% lower than their counterparts) after the holiday season, pulling back much more sigificantly. If we can impact this variable, we can also directly impact the Total Transaction Count!

alt.Chart(data).transform_density(
     'Total_Ct_Chng_Q4_Q1',
     groupby=['Attrition_Flag'],
     as_=['Total_Ct_Chng_Q4_Q1', 'density']
).mark_area(opacity=0.7, clip=True).encode(                
     alt.X('Total_Ct_Chng_Q4_Q1', scale=alt.Scale(domain=[0, 2]),
           title='Total Transaction Count Change (Q4 to Q1)'),   
     alt.Y('density:Q', scale=alt.Scale(domain=[0, 3])),  
     alt.Color('Attrition_Flag', sort='descending')
).properties(title='Change in Transaction Count (Q4 over Q1)', width=500, height=300
).configure_title(fontSize=16, dy=-10, anchor='start')

Recommendations

  • Promotions during Q1 time to keep the spending levels high. The more we spend the more attached we feel to the card, it’s benefits, etc. A “prevent the cliff” campaign where all customers who have historically shown a strong drop off, will get targeted with this promo.

  • Customer surveys. For those who are spending above $11,000 to understand why they love our card and what keeps them around. If we were also able to get responses from some of our churn customers, we can develop stronger marketing campaigns around our findings.

  • Offer loyalty points, cash back, etc.

  • Look at any historical marketing campaigns to see what we can learn from what worked / didn’t work.


Thanks for reading!