Factors Affecting Customers’ Interests in Vehicle Insurance

Hans Kristian
4 min readJun 21, 2021

A data-based approach using data from a Health Insurance company

Source: Walnut

Introduction

Insurance is a contract, represented by a policy, in which an individual or entity receives financial protection or reimbursement against losses from an insurance company. The company pools clients’ risks to make payments more affordable for the insured. The amount of money charged by the insurer to the policyholder for the coverage outlined in the insurance policy is called the premium.

Vehicle Insurance (a.k.a. Auto Insurance) is effectively a contract between yourself and an insurance company in which you agree to pay premiums in exchange for protection against financial losses stemming from an accident or other damage to the vehicle.

Building a model to predict whether a customer is interested in vehicle insurance is extremely useful because it can help the marketing department to plan for customer engagement strategies. A better marketing plan can lead to increased efficiency and higher profitability for the company.

In order to build the model, we will use the dataset provided by an insurance company that has sold health insurance products to its customers. This dataset and its description can be found on Kaggle at this link. We will analyze the data to find correlations between variables before building machine learning models to make predictions.

Part I: How Do ‘Previously Insured’ and ‘Vehicle Damage’ Indicators Affect Customers’ Interests?

Some features we’d like to scrutinize are ‘Previously Insured’ and ‘Vehicle Damage’ Indicators. The ‘Previously Insured’ indicator contains information on whether a customer already has vehicle insurance, while the ‘Vehicle Damage’ indicator tells us whether a customer got their vehicle damaged in the past.

It can be seen from the chart above that a customer who doesn’t have vehicle insurance but has experienced vehicle damage before is more likely to be interested than the other. They realize that financial loss caused by vehicle damage can be devastating because they didn’t have vehicle insurance before.

On the other hand, customers who already have vehicle insurance tend to be not interested, especially if they haven’t experienced any vehicle damage. They might be satisfied with their insurance and not considering buying a new one.

Part II: Are older customers more interested in vehicle insurance than newer customers?

The information related to the number of days a customer has been associated with the company is stored in vintage columns. We will divide the data into bins and analyze the proportion of customers interested in every bin.

There is no identifiable pattern/trend in the chart above and the proportion is approximately the same for every bin. Hence, we can conclude that older customers are equally likely to be interested than newer customers.

Part III: How well we can predict customers’ interests based on customers’ data?

After testing several machine learning models on the train data, we found that the Logistic Regression model yields the best result with 28% precision (from 100 customers who are predicted interested, 28 are indeed interested) and 94% recall (from 100 interested customers, 94 are predicted interested).

Although the precision is quite low, it is acceptable because random guessing will only yield 12% precision. Hence, using the model is better than randomly predicting the result.

The chart above shows the most important features learned by the model, with the first being the ‘Previously Insured’ indicator and the second beingthe ‘Vehicle Damage’ indicator. This result is consistent with our analysis from Part I. Other important features for predicting customers’ interests are customers’ age and certain policy sales channels.

Conclusion

In this article, we analyze some factors that affect customers’ interests in vehicle insurance.

  1. Customers which don’t have any vehicle insurance but have experienced vehicle damage before are most likely to be interested in vehicle insurance.
  2. Older customers are equally likely to be interested in vehicle insurance than newer customers.
  3. The Logistic Regression model yields decent results for predicting customers’ interests in the new vehicle insurance.

The findings here are observational, not the result of a formal study.

To see more about this analysis, see the link to my Github.

--

--

Hans Kristian

A hardworking and ambitious individual with a great passion for Data Science and Machine Learning especially in the Finance Industry