# Predicting Customer Lifetime Value in Python

## Learn how to calculate Customer Life Time Value in Python.

Italian economist Vilfredo Pareto states that 80% of the effect comes from 20% of the causes, this is known as 80/20 rule or Pareto principle. Similarly, 80% of companies’ business comes from 20% of customers. Companies need to identify those top customers and maintain a relationship with them to ensure continuous revenue. In order to maintain a long-term relationship with customers, companies need to schedule loyalty schemes such as the discount, offers, coupons, bonus point, and gifts.

Targetting a new customer is more costly than retaining existing customers because you don’t need to spend resources, time, and work hard to acquire new customers. You just have to keep the existing customers happy. Business analyst accurately calculates customer acquisition cost using CLTV(Customer Lifetime Value). CLTV indicates the total revenue from the customer during the entire relationship. CLTV helps companies to focus on those potential customers who can bring in more revenue in the future.

For more such tutorials and courses visit DataCamp:

In this tutorial, you are going to cover the following topics:

• Introduction
• Customer Life Time value(CLTV)
• Related Work of CLTV
• CLTV Formulas
• Implementing CLTV in Python
• Prediction model for CLTV
• Pros and Cons
• Conclusion

Photo by Austin Distel on Unsplash

# Customer Life Time value(CLTV)

Customer Lifetime Value is a monetary value that represents the amount of revenue or profit a customer will give the company over the period of the relationship. CLTV demonstrates the implications of acquiring long-term customers compare to short-term customers. Customer lifetime value (CLV) can help you to answers the most important questions about sales to every company:

• How to Identify the most profitable customers?
• How can a company offer the best product and make the most money?
• How to segment profitable customers?
• How much budget needs to spend to acquire customers?

Originally published at https://www.datacamp.com/community/tutorials/customer-life-time-value

# Calculate Customer Life Time Value

There are lots of approaches available for calculating CLTV. Everyone has his/her own view on it. For computing CLTV we need historical data of customers but you will unable to calculate for new customers. To solve this problem Business Analyst develops machine learning models to predict the CLTV of newly customers. Let’s explore some approaches for CLTV Calculation:

1. You can compute it by adding profit/revenue from customers in a given cycle. For Example, If the customer is associated with you for the last 3 years you can sum all the profit in these 3 years. you can average the profit yearly or half-yearly or monthly. but in this approach, you cannot able to build a predictive model for new customers.
2. Build a regression model for existing customers. Take recent six-month data as independent variables and total revenue over three years as a dependent variable and build a regression model on this data.
3. CLTV can also implement using RFM(Recency, Frequency, Monetary) values. for more detail, you can refer to my tutorial: https://www.datacamp.com/community/tutorials/introduction-customer-segmentation-python
4. Using the following equation
• Average Order Value(AOV): The Average Order Value is the ratio of your total revenue and the total number of orders. AOV represents the mean amount of revenue that the customer spends on an order.

`Average Order Value = Total Revenue / Total Number of Orders`

• Purchase Frequency(PF): Purchase Frequency is the ratio of the total number of orders and the total number of customers. It represents the average number of orders placed by each customer.

`Purchase Frequency = Total Number of Orders / Total Number of Customers`

• Churn Rate: Churn Rate is the percentage of customers who have not ordered again.
• Customer Lifetime: Customer Lifetime is the period of time that customer has been continuously ordering.

`Customer Lifetime=1/Churn Rate`

• Repeat Rate: Repeat rate can be defined as the ratio of the number of customers with more than one order to the number of unique customers. Example: If you have 10 customers in a month out of who 4 come back, your repeat rate is 40%.

`Churn Rate= 1-Repeat Rate`

# Removing Duplicates

Sometimes you get a messy dataset. You may have to deal with duplicates, which will skew your analysis. In python, pandas offer function drop_duplicates(), which drops the repeated or duplicate records.

# Let’s Jump into Data Insights

Count Plot

In the given dataset, you can observe most of the customers are from the “United Kingdom”. So, you can filter data for United Kingdom customers.

The describe() function in pandas is convenient in getting various summary statistics. This function returns the count, mean, standard deviation, minimum and maximum values, and the quantiles of the data.

Here, you can observe some of the customers have ordered in a negative quantity, which is not possible. So, you need to filter Quantity greater than zero.

# Filter required Columns

Here, you can filter the necessary columns for calculating CLTV. You only need her five columns CustomerID, InvoiceDate, InvoiceNo, Quantity, and UnitPrice.

• CustomerID will uniquely define your customers.
• InvoiceNo helps you to count the number of time transaction performed(frequency).
• Quantity is purchased item units in each transaction
• UnitPrice of each unit purchased by the customer will help you to calculate the total purchased amount.

Here, you are going to perform the following operations:

• Calculate the number of days between the present date and the date of last purchase from each customer.
• Calculate the number of orders for each customer.
• Calculate the sum of the purchase price for each customer.

# Calculate CLTV using the following formula:

1. Calculate the Average Order Value

2. Calculate Purchase Frequency

3. Calculate Repeat Rate and Churn Rate

4. Calculate Profit Margin

Profit margin is the commonly used profitability ratio. It represents how much percentage of total sales has earned as the gain. Let’s assume our business has approx 5% profit on the total sale.

# Prediction Model for CLTV

Let’s build the CLTV prediction model. Here, you are going to predict CLTV using Linear Regression Model. Let’s first use the data loaded and filtered above.

Extract month and year from InvoiceDate.

The pivot table takes the columns as input, and groups the entries into a two-dimensional table in such a way that provides a multidimensional summarization of the data.

Let’s sum all the month’s sales.

# Selecting Feature

Here, you need to divide the given columns into two types of variables dependent(or target variable) and independent variable(or feature variables). Select the latest 6 months as an independent variable.

# Splitting Data

To understand model performance, dividing the dataset into a training set and a test set is a good strategy.

Let’s split dataset by using function train_test_split(). You need to pass 3 parameters features, target, and test_set size. Additionally, you can use random_state as a seed value to maintain reproducibility, which means whenever you split the data will not affect the results. Also, if random_state is None, then the random number generator uses np.random for selecting records randomly. It means If you don’t set seed, it is different each time.

# Model Development

First, import the Linear Regression module and create a Linear Regression object. Then, fit your model on the train set using fit() function and perform prediction on the test set using predict() function.

# How Well Does the Model Fit the data?

In order to evaluate the overall fit of the linear model, we use the R-squared value. R-squared is the proportion of variance explained by the model. Value of R-squared lies between 0 and 1. Higher value or R-squared is considered better because it indicates the larger variance explained by the model.

This model has a higher R-squared (0.96). This model provides a better fit to the data.

# Model Evaluation

For regression problems following evaluation metrics used (Ritchie Ng):

• Mean Absolute Error (MAE) is the mean of the absolute value of the errors.
• Mean Squared Error (MSE) is the mean of the squared errors.
• Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors.

RMSE is more popular than MSE and MAE because RMSE is interpretable with y because of the same units.

# Pros and Cons of CLTV

CLTV helps you to design an effective business plan and also provide a chance to scale your business. CLTV draw meaningful customer segments these segment can help you to identify the needs of the different-different segment.

Customer Lifetime Value is a tool, not a strategy. CLTV can figure out the most profitable customers, but how you are going to make a profit from them, it depends on your strategy. Generally, CLTV models are confused and misused. Obsession with CLTV may create blinders. Companies only focus on finding the best customer group and focusing on them and repeat the business, but it’s also important to give attention to other customers.

# Conclusion

Congratulations, you have made it to the end of this tutorial!

In this tutorial, you have covered a lot of details about Customer Lifetime Value. You have learned what customer lifetime value is, approaches for calculating CLTV, implementation of CLTV from scratch in python, a prediction model for CLTV, and Pros and Cons of CLTV. Also, you covered some basic concepts of pandas such as groupby and pivot table for summarizing selected columns and rows of data.

Hopefully, you can now utilize the CLTV concept to analyze your own datasets. Thanks for reading this tutorial!

Originally published at https://www.datacamp.com/community/tutorials/customer-life-time-value

Do you want to learn data science, check out on DataCamp.

For more such article, you can visit my blog Machine Learning Geek

Sr Data Scientist| Analytics Consulting | Data Science Communicator | Helping Clients to Improve Products & Services with Data

## More from Avinash Navlani

Sr Data Scientist| Analytics Consulting | Data Science Communicator | Helping Clients to Improve Products & Services with Data

## Generating Statistics in A.I. and Science

Get the Medium app