instant | CLV the right way using probabilistic programming

How-to

CLV the right way using probabilistic programming

2019-05-31 12:52:41.441911

1. Introduction

As Data Scientists as Warply, all our projects are driven by a relationship commerce imperative. Our experience shows that metrics, such as broad averages, are usually not enough to detect hidden patterns on customer level. Such broad metrics makes it difficult to gain a full picture of what makes each customer unique, and therefore difficult to take effective and profitable bussiness decitions.

In order to address this problem, the models that our Data Science team build, focus mainly on the purchasing behaviour of each individual customer. Such an approach is particularly useful when we try to predict the Customer Lifetime Value (abbreviated CLV) of customers, which is defined as the total amount that customers will spend over their lifetime.

Unfortunately, most companies that estimate the CLV of their customers, are forecasting customer spending in aggregate, faceting by acquisition channel or other broad categories based on geography, sales channels, or demographics. Our experience shows that such an approach is completely wrong and can lead to largely unseen biases. Companies that ignore individual level variation in CLV are likely to over- or underestimate their customers’ future spending. This can lead to an incorrect picture of the health of the business, resulting in misallocated resources, misguided strategies, and missed revenue opportunities.

The ultimate goal is to undertand the potential spending patterns of individual customers in order to unlock the full potential of relationship-building and loyalty programs.

The approach that we describe in this blogpost, leverages the latest advances in probabilistic programming, in order to estimate the full distributions of potential value for every single customer in our dataset. We will use the Pythonic package PyStan (a package for Bayesian Inference that uses the No-U-Turn sampler, a faster variant of Hamiltonian Monte Carlo) to estimate the CLV with individual-level granularity.

2. Load& Preprocess Data

In [1]:

from IPython.display import Image
Image(filename='/Users/warply/Desktop/CLV.png', width = 1000, height = 1000)

Out[1]:

Customers exhibit a wide range of spending behavior, from one-and-done transactions to long-term relationships or subscriptions (loyal customers). Understanding and leveraging those differences lets companies build and protect customer relationships and increase market share. For example, if two companies X and Y have the same average customer churn rate, but company X has a mix of very loyal customers and one-and-done customers, and all of company Y’s customers are homogenous, company X can be much more valuable — if they recognize this and act upon it to their advantage.

At Warply, we provide Loayalty solutions to hundreds of customers, so taking into account the customer-level heterogeneity is of paramount importance!

To illustrate our approach to CLV modeling, we use a simplified dataset, which includes only customer id's, purchase amounts and dates. The dataset comes from a Big Player in the Retail Industry that uses Warply's Loyalty Solutions. All the data has been cleaned and anonymized, due to a confidentiality agreement. The first purchase in the dataset occured on 7th of March, 2019 and the last on 10th of May.

We won't dive into the math behind the model, as it can get very tricky (For those interesed in the inner-workings of the CLV model, a worth reading paper that explains the mathematics, is that of Peter S. Fader and Bruce G. S. Hardie: "here"). Instead, we will focus on the intuition, which is very simple but equally important!

We break each customer’s spending into three component parts:

1) Transaction rate: which is the number of transactions a customer makes in a given time period.

2) Dropout rate: is the probability that a customer stops shopping over a given time period, and

3) Average spending: which is the customer’s average transaction amount.

The steps that we follow bellow, are the typical of a Data Science pipeline, and involves: Data Cleaning/Preprocessing, Data Visualization, Model Selection, Model Evaluation and a lot of iterations!

As you can see in the cells below, first we import all the required libraries for this project, then we load the dataset into a Pandas DataFrame, convert the transaction_date column to a datetime column and finally check the final dataframe for missing values.

In [24]:

#import all libraries
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import datetime
from datetime import datetime
import os
import sys
import numpy as np
import pystan
import time
import arviz as az
import joypy
%matplotlib inline

In [27]:

#load data
df = pd.read_csv(filepath_or_buffer = "/Users/warply/Desktop/transactional_data.txt")

In [4]:

#display the first 5 rows of the dataframe 
df.head(n = 5)

Out[4]:

	customer_id	transaction_date	value
0	8400	2019-03-07	63.32
1	4180	2019-03-07	58.97
2	6660	2019-03-07	25.69
3	12293	2019-03-07	37.44
4	16145	2019-03-07	9.28

In [5]:

#display the last 5 rows of the dataframe 
df.tail(n = 5)

Out[5]:

	customer_id	transaction_date	value
36347	17530	2019-05-10	106.73
36348	12839	2019-05-10	27.24
36349	10119	2019-05-10	28.58
36350	16100	2019-05-10	441.49
36351	5017	2019-05-10	5.49

In [6]:

print("Number of rows in the dataframe: {:,}".format(df.shape[0]))

Number of rows in the dataframe: 36,352

In [7]:

print("Date of First Transaction: {}".format(df["transaction_date"].min()))

Date of First Transaction: 2019-03-07

In [8]:

print("Date of Last Transaction: {}".format(df["transaction_date"].max()))

Date of Last Transaction: 2019-05-10

In [9]:

#get the data type of each columns
df.dtypes.to_frame(name = "data_types")

Out[9]:

	data_types
customer_id	int64
transaction_date	object
value	float64

In [10]:

#convert dates into a datetime format
df["transaction_date"] = pd.to_datetime(df["transaction_date"])

In [11]:

#check for missing values
df.isnull().sum(axis = 0).to_frame(name = "number_of_missing_values")

Out[11]:

	number_of_missing_values
customer_id	0
transaction_date	0
value	0

3. Visualize Transactional Data for the first 100 Customers

Using a scatterplot, we will plot the purchasing behaviour of the first onehundred customers in the dataset for the time period between 7th of March, 2019 and the 10th of May.

Each dot in the plot represents a transaction, the x-axis shows when it occurred in time, and the size of the dot shows its relative amount. Some customers have been making transactions regularly throughout the entire period; others have made few transactions sparsely. Some customers tend to spend a lot in each transaction, others generally spend little.

The most interesting part here is the purchasing pattern a few days before 7th of April, where the purchase frequency increses for almost all customers.

In [12]:

#get the first 100 customers
consumer_mask = df["customer_id"] < 100
onehundred_customers = df[consumer_mask].sort_values(by = "customer_id", ascending = True)

In [13]:

print("The dataframe contains: {} rows.".format(onehundred_customers.shape[0]))

The dataframe contains: 191 rows.

In [14]:

print("The dataframe contains: {} customers.".format(onehundred_customers["customer_id"].nunique()))

The dataframe contains: 100 customers.

In [15]:

print("The first transaction happened on: {}".format(str(onehundred_customers["transaction_date"].min()).split(" ")[0]))

The first transaction happened on: 2019-03-07

In [16]:

print("The last transaction happened on: {}".format(str(onehundred_customers["transaction_date"].max()).split(" ")[0]))

The last transaction happened on: 2019-05-10

In [17]:

#get the labels and indexes for the first 100 customers
onehundred_customer_indexes = onehundred_customers["customer_id"].unique().tolist()
onehundred_customer_labels = list(map(lambda e: "Customer Id: {}".format(e), onehundred_customer_indexes))

In [21]:

#matplotlib.rcdefaults()
#register matplotlib converters
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
from matplotlib.ticker import MultipleLocator
import matplotlib.dates as dt

fig, ax = plt.subplots(figsize = (60, 45),
                       dpi = 100)

scatter = ax.scatter(onehundred_customers["transaction_date"],
                     onehundred_customers["customer_id"],
                     s = 10*onehundred_customers["value"],
                     alpha=0.5,
                     linewidths = 2.,
                     edgecolors = "r")

#set the grid for each minor tick
minor_ticks = np.arange(0, 100, 1)
ax.set_yticks(minor_ticks, minor=True)
ax.grid(which='minor', alpha=0.5)
ax.xaxis.grid(linewidth=1.)


#delete the ticks/labels on the y-axis and plot the labels on the y-axis
ax.yaxis.set_major_locator(plt.NullLocator())
ax.set_yticklabels(onehundred_customer_labels, minor = True, size = 30)

#set the limits of the x-axis
ax.set_xlim((datetime(2019, 3, 6), datetime(2019, 5, 20)))

#set the ticks and labels on the x-axis
ax.xaxis.set_minor_locator(dt.DayLocator())
ax.set_xticks([datetime(2019, 3, 7)] + [datetime(2019, month, 7) for month in range(4, 6)] + [datetime(2019, 5, 10)])
tick_labels = [datetime(2019, 3, 7)] + [datetime(2019, month, 7) for month in range(4, 6)] + [datetime(2019, 5, 10)]
ax.set_xticklabels(list(map(lambda e: e.strftime("%Y/%m/%d"), tick_labels)), size = 35)

#rotate the last label
ax.get_xticklabels()[-1].set_rotation(90)


#set the title name
ax.set_title("Customer Purchases - First 100 Customers", size = 65, y = 1.004)

#set the vertical line
ax.vlines(x = datetime(2019, 5, 10),
          ymin = 0,
          ymax = 100,
          colors = "k",
          linestyles = "dashed",
          **{"linewidths": 5})

#set the text right to the vertical line
ax.text(s = '?',
        x = 0.935,
        y = 0.5,
        horizontalalignment='center',
        verticalalignment='center',
        wrap = True,
        transform=ax.transAxes,
        fontsize = 130,
        **{"clip_on":True})


ax.text(s = 'Forecast Period',
        x = 0.935,
        y = 0.019,
        horizontalalignment='center',
        verticalalignment='center',
        wrap = True,
        transform=ax.transAxes,
        fontsize = 50,
        **{"clip_on":True})

Out[21]:

Text(0.935, 0.019, 'Forecast Period')

4. Transform data to RFM format

In order to use the Bayesian model, we transform the data to a RFM format by calculating the following columns/quantities: days_since_first_transaction: the days that have passed since the first purchase of a customer, days_since_last_transaction: the days that have passed since the last purchase of a customer, average_money_spent: the average amount of money a customer spend during his "lifetime" up to date and the numer_of_repeated_transactions: which inludes the number of repeated transactions a customer made since his first purchase.

In [106]:

#display the first 5 rows of the original dataframe
df.head(n = 5)

Out[106]:

	customer_id	transaction_date	value
0	8400	2019-03-07	63.32
1	4180	2019-03-07	58.97
2	6660	2019-03-07	25.69
3	12293	2019-03-07	37.44
4	16145	2019-03-07	9.28

In [107]:

#sort the dataframe in a descending order by the customer_id column
df = df.sort_values(by = "customer_id", ascending = True)

In [111]:

#find the first transaction of each customer
first_transaction_df = df.groupby(["customer_id"])["transaction_date"].min().to_frame(name = "first_transaction")

#find the last transaction of each customer
last_transaction_df = df.groupby(["customer_id"])["transaction_date"].max().to_frame(name = "last_transaction")

In [112]:

#find the number of days since the first transaction for each customer
days_since_first_transaction = first_transaction_df.apply(lambda t: datetime.now() - t)\
                                                   .loc[:,"first_transaction"].dt.days\
                                                   .to_frame(name = "days_since_first_transaction")

In [113]:

#find the number of days since the last transaction for each customer
days_since_last_transaction = last_transaction_df.apply(lambda t: datetime.now() - t)\
                                                 .loc[:,"last_transaction"].dt.days\
                                                 .to_frame(name = "days_since_last_transaction")

In [114]:

#find the average amount each customer spent
average_money_spent = df.groupby(["customer_id"])["value"].mean().to_frame(name = "average_money_spent")

In [115]:

#find the number of repeated transactions for each customer
numer_of_repeated_transactions = df.groupby(["customer_id"])["transaction_date"].count().to_frame(name = "number_of_repeated_transactions")

In [132]:

#get the customer id as a separate column
customer_id = numer_of_repeated_transactions.reset_index(drop = False).loc[:,["customer_id"]]
customer_id = customer_id.set_index("customer_id", drop = False)

In [133]:

#concatenate all the tables above in a final dataframe
final_df = pd.concat([customer_id,
                      days_since_first_transaction,
                      days_since_last_transaction,
                      average_money_spent,
                      numer_of_repeated_transactions],
                      axis = "columns")

In [134]:

#display the first 5 rows in the dataframe
final_df.head(n = 5)

Out[134]:

	customer_id	days_since_first_transaction	days_since_last_transaction	average_money_spent	number_of_repeated_transactions
customer_id
0	0	58	43	16.0475	4
1	1	57	57	7.2900	1
2	2	33	33	208.9200	1
3	3	67	67	19.9900	1
4	4	25	25	182.3800	1

In [136]:

print("The number of unique customers is equal to: {:,}".format(final_df["customer_id"].nunique()))

The number of unique customers is equal to: 18,004

We rename all the columns above and finally get the following data on each customer:

1. x: total repeat transactions (Frequency)
1. t_x: time of most recent transaction (Recency)
1. t_cal: time since first transaction, and
1. mx: average transaction amount

In [138]:

#rename the columns according to the following scheme:

# number_of_repeated_transactions : x
# days_since_last_transaction : t_x (most recent transaction)
# days_since_first_transaction : t_cal (oldest transaction)
# average_transaction_amount: mx (total amount of spent money)
final_df.rename(columns = {"number_of_repeated_transactions":"x",
                           "days_since_last_transaction":"t_x",
                           "days_since_first_transaction":"t_cal",
                           "average_money_spent":"mx"},
                           inplace = True)

5. Define& compile the Bayesian Model

Now, let's create the PyStan CLV model.

In order to estimate distributions for each customers’ transaction rate, churn rate and lifetime value, we use the following Bayesian model of customer lifetime value based on Fader and Hardie work (2003 and 2013).

For each customer we define the following probabilistic quantities:

P(X(t) = x) ~ Poisson(λt) While active, the number of transactions a customer makes in a period of length t follows a Poisson distribution with rate parameter λ.

τ ~ Exponential(μ e^(-μτ)) Each customer has an active period of length τ, which follows an Exponential distribution with dropout rate μ.

mx ~ Gamma(px, vx) Each customer’s transaction amount follows a Gamma distribution with shape parameter p and scale parameter v. This means that each customers average transaction amount follows a Gamma distribution with shape parameter px and scale parameter vx.

For all customers we define the following quantities:

λ ~ Gamma(rr, α) All customers’ transaction rates are drawn from a common Gamma distribution.

μ ~Gamma(ss, β) All customers’ dropout rates are drawn from a common Gamma distribution.

v ~ Gamma(q, y) The scale parameter of spending distributions are drawn from a common Gamma distribution.

We take advantage of the PysTAN blocks of code to declare and transform our data and parameters, define our prior and likelihood functions.

In the functions block, we combine the transaction rate and dropout rate distributions into a custom likelihood function.

In [141]:

#define the PyStan model
line_code = """

// Likelihood function
functions {
    vector llh(vector x, vector t_x, vector t_cal, 
              vector lambda1, vector mu1, 
              vector log_lambda, vector log_mu, 
              vector log_mu_lambda, int N) {
    vector[N] p_1;
    vector[N] p_2;
    p_1 = x .* log_lambda + log_mu - log_mu_lambda - t_x .* mu1 - t_x .* lambda1;
    p_2 = x .* log_lambda + log_lambda - log_mu_lambda - t_cal .* mu1 - t_cal .* lambda1;
    return(log(exp(p_1) + exp(p_2)));
  }
}
data {
  int<lower = 1> N;            // Number of customers
  int<lower = 0> N_months;     // Number of months for ltv calibration
  vector<lower = 0>[N] x;      // Repeat transactions per customer (frequency)
  vector<lower = 0>[N] t_x;    // Time of most recent transation (recency) 
  vector<lower = 0>[N] t_cal;  // Time since first transaction 
  vector[N] mx;                // Average transaction amount
}


transformed data {
  vector<lower = 0>[N] x_tot;  // Total number of transactions per cust
  x_tot = x + 1;
}
parameters {
  vector<lower = 0>[N] lambda; // Transaction rate
  vector<lower = 0>[N] mu;     // Dropout probability
  real<lower = 0> rr;          // Transaction shape parameter
  real<lower = 0> alpha;       // Transaction scale parameter
  real<lower = 0> ss;          // Dropout shape parameter
  real<lower = 0> beta;        // Dropout scale parameter
  real <lower=0> p;            // Shape of trans amt gamma        
  vector<lower=0>[N] v;        // Scale of trans amt gamma (cust specific)
  real <lower=0> q;            // Shape of scale dist gamma  
  real <lower=0> y;            // Scale of scale dist gamma    
}
transformed parameters {
  vector[N] log_lambda; 
  vector[N] log_mu;
  vector[N] log_mu_lambda;
  vector[N] log_lik;
  vector<lower=0> [N] px;      // Shape of total spend distribution
  vector<lower=0> [N] nx;      // Scale of total spend distribution
  px = p * x_tot;  
  nx = v .* x_tot;
  log_lambda = log(lambda);
  log_mu = log(mu);
  log_mu_lambda = log(mu + lambda);
  log_lik = llh(x, t_x, t_cal, lambda, mu, log_lambda, log_mu, log_mu_lambda, N);
}
model {
  // Priors for rates
  lambda ~ gamma(rr, alpha);
  mu ~ gamma(ss, beta);
  rr ~ exponential(1);
  alpha ~ exponential(1);
  ss ~ exponential(0.1);
  beta ~ exponential(0.1);
  // Likelihood for rate
  target += log_lik;
  // Priors for spend
  p ~ exponential(0.1);    
  q ~ exponential(0.1);    
  y ~ exponential(0.1);    
  v ~ gamma(q, y); 
  // Likelihood for spend
  mx ~ gamma(px, nx); 
}
generated quantities {
  vector[N] p_alive;        // Probability that they are still "alive"    
  vector[N] exp_trans;      // Expected number of transactions 
  vector[N] mx_pred;        // Per transaction spend
  vector[N] lt_val;         // Lifetime value
  for(i in 1:N) {
    p_alive[i] = 1/(1+mu[i]/(mu[i]+lambda[i])*(exp((lambda[i]+mu[i])*(t_cal[i]-t_x[i]))-1));
    exp_trans[i] = (lambda[i]/mu[i])*(1 - exp(-mu[i]*N_months));
    mx_pred[i] = gamma_rng(px[i], nx[i]);  
  }
  lt_val = exp_trans .* mx_pred;
}
"""

In [25]:

#check the PyStan version
print("PyStan version: {}".format(pystan.__version__))

PyStan version: 2.17.1.0

Next, we compile our probabilistic model:

In [142]:

#compile the PyStan model
start = time.time()
sm = pystan.StanModel(model_code = line_code, model_name = 'CVL')
end = time.time()

print("PyStan model compiled in: {0:.2f}sec".format(end - start))

INFO:pystan:COMPILING THE C++ CODE FOR MODEL CVL_6fc0af7e0bb3b6e12a459d93b9b03e0c NOW.
/anaconda3/lib/python3.7/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /var/folders/xs/lcth53h977zd86kbyvjl5qgc0000gn/T/tmpvunuin2i/stanfit4CVL_6fc0af7e0bb3b6e12a459d93b9b03e0c_1675693954743122787.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)

PyStan model compiled in: 42.65sec

We can see that the model compiles in about 42.65 second, which is a reasonable time.

6. Prepare the data in a specific format for the PyStan model

The dataset is too big - it contains 18.004 unique customers! To reduce the model inference time, we sample our initial dataset to create a new small dataset of about 90 customers.

In [144]:

#get a sample from the data
final_df_sampled = final_df.sample(frac = 0.005)

In [151]:

#save all the sampled data to a dictionary
CLV_data = dict(x = final_df_sampled.loc[:,"x"],
                t_x = final_df_sampled.loc[:,"t_x"],
                t_cal = final_df_sampled.loc[:,"t_cal"],
                mx = final_df_sampled.loc[:,"mx"],
                N = final_df_sampled.shape[0],
                N_months = 2)

Finally, we hit the MCMC button!

In [152]:

#perform sampling 
fit = sm.sampling(data = CLV_data,
                  chains = 10,
                  iter = 3000,
                  warmup = 1000)

/anaconda3/lib/python3.7/site-packages/pystan/misc.py:399: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  elif np.issubdtype(np.asarray(v).dtype, float):

7. Convergence Assesment

Once we have performed MCMC sampling, it's time to assess the convergence of our simulation.

In [250]:

az.style.use('arviz-darkgrid')
#initialize the plot
fig, ax = plt.subplots(figsize = (12,7))

#plot marginal energy
inference_data = az.convert_to_inference_data(fit)
az.plot_energy(inference_data, ax = ax)
ax.set_title("Marginal Energy", size = 15)

Out[250]:

Text(0.5, 1.0, 'Marginal Energy')

8. Visualize the Feature Transactions

Now, we are ready to plot, for each customer, the estimated full distributions for their feature transactions rates, churn probability and total spending.

The plots below display the distributions of feature transactions, churn probabilities and lifetime value for all of our customers.

In [174]:

#load samples into a pandas dataframe
exp_trans_distribution = pd.DataFrame(data = fit.extract()["exp_trans"],
                                      columns = final_df_sampled["customer_id"].values)

In [189]:

#transform data into a specific format for visualization
exp_trans_distribution_unstacked = exp_trans_distribution.unstack()
exp_trans_distribution_unstacked = exp_trans_distribution_unstacked.to_frame(name = "exp_trans_distribution")
exp_trans_distribution_final = exp_trans_distribution_unstacked.reset_index(level = 0)\
                                                               .rename(columns = {"level_0":"customer_id"})

In [196]:

#get customer indexes
customer_indexes = exp_trans_distribution_final["customer_id"].unique()

#get the customer Id's in a nice format
customers = list(map(lambda e: "{:,}".format(e), customer_indexes))

#get the labels on the y-axis
labels = list(map(lambda e: "Customer Id: " + e ,customers))

In [274]:

fig, axes = joypy.joyplot(exp_trans_distribution_final,
                          by = "customer_id",
                          column = "exp_trans_distribution",
                          range_style = "own",
                          figsize = (10,15),
                          legend = False,
                          grid = "both",
                          fade = True,
                          linewidth = 0.5,
                          kind = "counts",
                          tails = 0.01,
                          x_range = [-0.07,1], 
                          labels = labels)
axes[-1].set_title("Expected Transactions - Next Two Months)", size = 20)
axes[1].set_ylabel("Customers", size = 19)
axes[-1].set_xlabel("Days", size = 19)
axes[-1].set_xticklabels(["0d","15d","30d","45d", "60d"])
#set the y-label position on the y-axis
axes[1].yaxis.set_label_coords(-0.295, -10)

#set the label size on the y-axis
for i, axis in enumerate(axes):
    axes[i].tick_params(axis = 'y', which = 'major', labelsize = 12.5)
    
#set the label size on the x-axis
axes[-1].tick_params(axis = 'x', which = 'major', labelsize = 14.5)

9) Visualize Churn Rate (for the same customers)

In [215]:

#get the churn_rate probability = 1 - p_alive
churn_p_distribution = pd.DataFrame(data = 1 - fit.extract()["p_alive"],
                                    columns = final_df_sampled["customer_id"].values)

In [218]:

#get the churn distribution per customer
churn_p_dist = churn_p_distribution.unstack()\
                                   .to_frame(name = "churn_rate_dist")\
                                   .reset_index(level = 0, drop = False)\
                                   .rename(columns = {"level_0":"customer_id"})
#churn_p_dist.describe()
#churn_p_dist["churn_rate_dist"] = churn_p_dist["churn_rate_dist"].apply(lambda e: 1 if e > 1 else 0)

In [286]:

fig, axes = joypy.joyplot(churn_p_dist,
                          by = "customer_id",
                          column = "churn_rate_dist",
                          range_style = "own",
                          figsize = (10,15),
                          legend = False,
                          grid = "both",
                          fade = True,
                          linewidth = 0.5,
                          kind = "counts",
                          x_range = [-0.2,1.25], #-0.15
                          labels = labels)

axes[-1].set_title("Expected Churn - Next Two Months", size = 20)
axes[1].set_ylabel("Customers", size = 19)
axes[-1].set_xlabel(" ", size = 19)
axes[-1].set_xlabel("Churn Probability", size = 19)
#set the y-label position on the y-axis
axes[1].yaxis.set_label_coords(-0.25, -9)

#set the label size on the y-axis
for i, axis in enumerate(axes):
    axes[i].tick_params(axis = 'y', which = 'major', labelsize = 12.5)
    
#set the label size on the x-axis
axes[-1].tick_params(axis = 'x', which = 'major', labelsize = 14.5)

10. Visualize Customer Lifetime Value (CLV for the same customers)

In [240]:

lt_val_distribution = pd.DataFrame(data = fit.extract()["lt_val"], 
                                   columns = final_df_sampled["customer_id"].values)

In [255]:

#get the churn distribution per customer
lt_val_dist = lt_val_distribution.unstack()\
                                 .to_frame(name = "lt_val_dist")\
                                 .reset_index(level = 0)\
                                 .rename(columns = {"level_0":"customer_id"})

In [287]:

fig, axes = joypy.joyplot(lt_val_dist,
                          by = "customer_id",
                          column = "lt_val_dist",
                          range_style = "own",
                          figsize = (30,25),
                          legend = False,
                          grid = "both",
                          fade = True,
                          kind = "kde",
                          labels = labels,
                          x_range = [-0.07,5],
                          tails = 0.01,)

axes[-1].set_title("Customer Lifetime Value - Next Two Months", size = 20)
axes[1].set_ylabel("Customers", size = 19)
axes[-1].set_xlabel("Revenue (€)", size = 19)
axes[-1].set_xticklabels(["0€","100€","200€","300€", "400€"])
#set the y-label position on the y-axis
axes[1].yaxis.set_label_coords(-0.25, -5)

#set the label size on the y-axis
for i, axis in enumerate(axes):
    axes[i].tick_params(axis = 'y', which = 'major', labelsize = 12.5)
    
#set the label size on the x-axis
axes[-1].tick_params(axis = 'x', which = 'major', labelsize = 14.5)

10. Conclusions

The customer lifetime value (CLV) metric might not sound very important, but failing to calculate it can put you behind your competitors.

CLV tells you how well you’re resonating with your audience, how much your customers like your products or services, and what you’re doing right — as well as how you can improve. It is essentially a measurement of how much a business’s customers are worth over their lifetime.

Despite its importance, most companies are doing it wrong. At face value, CLV is an easy concept to understand. In practice though, it’s deceptively hard to implement in a way that accurately captures the variation in customer behavior.

Most companies are calculating their customers CLV by aggregate, using segments such as acquisition channel or other broad categories based on geography, sales channels, or demographics. From our experience, such an approach can lead to large unseen biases and therefore, to an incorrect picture of the health of the business, which can subsequently result in misallocated resources, misguided strategies, and missed revenue opportunities.

In this blogpost we showed how by using a simple Bayesian model, our DS team managed to overcome all the problems mentioned above and deliver an easy-to-deploy and highly interpretable analytics solution to one of the biggest players in the Retail Industry. And the best af all: by estimating the CLV with individual-level granularity, you automatically get bonus two additional metrics: the expected transactions and the churn probability of each customer.

In the end, I think we all agree that CLV is so valuable to every business, that putting in the time and study to estimate it properly it’s definitely worth it.

How-to

Goal Setting Plan for Application Pre-launch Checklist