• marketing AI

    Marketing Analytics: Marketing AI | Finding Cross-sell opportunities with Machine Learning

    In this marketing AI use case, lets us assume that a consumer bank wants to cross-sell its insurance product to customers, who have secured or unsecured type of loan. Now with the help of marketing artificial intelligence, depending on the customer’s historical behaviors and portfolio, the bank wants to apply analytics and marketing AI to decide:

    1. Who is likely to buy their insurance product, finding the right customer?
    2. What type of PPI insurance product should be targeted to whom, finding the right product?

    Now the bank has provided the customer data with the following features like credit score, outstanding loan amount, residential status, demographic information like age group, income group, etc, and finally if the customer is already holding any insurance and if yes then what type of insurance product, etc.

    Let us understand the sample data first:

    This contains the whole dataset. Now from this original dataset, we need to extract the customer information which has the PPI = 1, only that is they are insured with a PPI.

    Marketing AI: Feature Engineering (Personal Protection Plan/ PPI):

    Observation 1: It shows there are no duplicate values in the dataset.

    Observation 2: We can see that the target variable “PPI” is balanced and hence okay.

    Observation 3:

    • Gender variables are not equally distributed, Male customers are likely to buy PPI Insurance more than Female.
    • Similarly, Married persons are the most prospective customers followed by the Single
    • Among all three PPI products, PPI Joint and PPI Lci are the underperforming ones
    • PPI Single is the most popular insurance category, covering almost 60-70% of total

    Observation 4:

    • The Age box-whisker plot shows, middle-aged people (35-50 years) are likely to buy PPI insurance.
    • The income group box is positively skewed, it seems customers with high-income group >3 have a higher chance of buying PPI insurance.
    • Similarly Mosaic box is normally distributed but there’re some outliers (abnormality) in the data.
    • Similarly, customers with mid-range Mosaic value (kind of lifestyle) between 20-40 or Mosaic class (=4 & =8) are most likely to buy PPI.

    Observation 5:

    • Customers with a credit score greater than >750 look more prospective PPI buyers.
    • Customers are likely to purchase PPI when the loan type is unsecured & the outstanding balance is more than >50K.
    • Likewise, House owners are very likely to buy the insurance followed by the Tenants.
    • Customers staying with the bank for more than 100 months are likely to go for PPI
    • Loan grades seem in alphabetic order; A & X are the best performing grades
    • Time at Address shows a gradual decline (not very sharp) after 150 months

    Observation 6:

    • The Worst status flag shows six different customer statuses are possible indicating how many months someone has not paid his dues even after the due date.
    • Worst Status flag = 0 (& =1 can be included) are likely to buy PPI insurance
    • CIFAS = Y seems the suspicious flag, should be reviewed by the bank’s credit risk team

    Marketing Analytics – Machine Learning Model:

    Model Training & Testing:

    70% data is used for model training and 30% for testing. Before applying the machine learning model lets us do the exploratory analysis using this steps:

    • Missing value treatment & duplicate check
    • Categorical (converted to numerical) & continuous variables distribution analysis
    • Feature selection (correlation analysis) & Scaling (MinMax)

    Model 1: Finding the Right Customers based on PPI [1,0] is a binary classification problem:

    Three models are used here as below:

    1. LogisticRegression (penalty, class weight used)
    2. RandomForest (Grid search & basic hyperparameter tuning used)
    3. XGBoost (basic hyperparameter tuning used)
    • Accuracy – % of customers that the model has correctly identified as customers that did & didn’t purchase PPI
    • Precision – Of all the customers that the model predicts will purchase PPI, the % of customers that bought the PPI
    • Recall – % of customers the model successfully identifies that purchased the product

    Reflection: Lets compare Accuracy, Recall, ROC curve and pick the best model:

    • Here in this use case, false negatives (likely to buy but not getting offers) are still okay but not the false positives cases (customers not likely to buy but getting PPI offers) as the FP cases would incur extra campaign/ sales cost.
    • Hence, Recall & Precision are the more appropriate measures (relevant accuracy) for this use case.
    • Similarly, comparing ROC curves XGBoost is performing a little better than the Random Forest model (curves closer to the top-left corner indicate better performance), so we can go ahead with any of the two (i) Random Forest or (ii) XGBoost model.
    • The top 10 influencers are here, these parameters should be explored by the marketing team to optimize the campaigns.

    Model 2: Finding the Right PPI Product based on PPI categories [single, joint, Lci], is a multi-class classification problem:

    Two models are used here:

    1. KNN Classifier (n_neighbors is used)
    2. Decision Tree classifier (Gini and max depth are used)
    • Customers who’ve bought PPI (only PPI=1), are grouped together to see the target variable PPI category has 3 classes: (i) Single (ii) Joint (iii) Lci
    • So, a multi-class classification model [(a) KNN & (b) Decision Tree] is developed to see what type of products to be offered to the right customers.
    • Actual vs Prediction can be analyzed side by side.

    Reflection: Lets compare Accuracy, Recall, ROC curve and pick the best model:

    • Both KNN & Decision tree multi-class classifier models are performing just okay.
    • Decision tree Accuracy ~ 69%  is better than the KNN model.
    • We opt for the Decision Tree model here. Other models such as SVM can also be build but tree-based models usually work better than SVM when the data volume is lesser.

    Marketing AI model conclusion: AI & Analytics driven approach to build target based product offerings

    1. The people who do not usually miss the payments can be targeted for the insurance cross-sell. The customers having CIFAS (fraud suspicion) detected ‘Y’ must be reviewed by the bank before offering them PPI as they have missed the payment deadline several times.
    2. Young customers of age below 35 years seem not interested in PPI, maybe lack of experience or less maturity level.
    3. Out of all PPI products, “PPI single” is the most popular insurance category compared to “PPI Lci” and “PPI joint”; although PPI joint can be targeted to the married customers since it has the potential of cross-selling.
    4. For finding the right product both KNN and Decision Tree Classifier are performing well. Of them, the accuracy of the Decision Tree classifier has better accuracy than the KNN classifier.
    5. For finding the right customer, comparing the ROC curves we see that the XGBoost Classifier has better accuracy and performance than the Random Forest and Logistic Regression. However, both the models have to be retrained with more data in the coming months with hyperparameter tuning to pick the best one.
    6. Typically this marketing campaign runs every 2-3 weeks and 20-25% of the target based are targeted with the new offer and they’re removed from the campaign list until the next few campaigns run. As it continues the marketing data should be analyzed further to assess what %ages of the targeted customers bought the offer and they further need to be returned quarterly basis. This brings the actual value of AI in Marketing campaigns.

    P.S. Check out more AI solutions here:

    1> Detect malicious URLs using ML

    2> Malware Detection using ML

    For detail python code, lets visit the GitHub link: