Motor Insurance Claim Frequency Prediction Using XGBoost

Naomi Kollongei *

Maseno University, Kenya.

Fredrick Onyango

Department of Statistics and Actuarial Science, Maseno University, Kenya.

*Author to whom correspondence should be addressed.


Abstract

Insurance claim frequency modelling is an important task for non-life insurers, this together with other variables forms an important part of product pricing and risk management. The traditional frequency models such as Poisson, Negative Binomial and Zero Inflated models have several weakness such as scalibility issues, overdispersion and independence assumptions for large datasets and therefore not ideal to use when dealing with complex and unstructured data. Extreme gradient Boosting Algorithm (XGBoost) is an ensemble learning which has the capacity to effectively handle big complex and unstructured insurance data. XGBoost creates tree-based models by iteratively fitting decision trees to the residuals of the previous predictions, effectively reducing the error in each iteration. This research utilized and explored the XGBoost algorithm to process motor insurance claims large dataset in-order to predict the frequencies of insurance claims, that is 0,1,2,and 3. Using this algorithm we aim to enhance the accuracy of predictions that will yield better estimates for improved risk assessment and pricing of insurance products. Cross validation was performed to assess the true performance of our model. Cross validation results showed that XGBoost models for the claim frequency had a RMSE estimate of 0.949, MAE of 0.7741 and RSQ 0.781. This demonstrated a strong predictive performance, with an RMSE of 0.949 and an MAE of 0.7741, indicating a low average error in the predictions.The RSQ of 0.781 suggests further that the model explained a significant proportion of variability of the insurance data. Our model was evaluated with a confusion matrix. The results of the confusion matrix showed that for the frequency 0 99.59% of cases were correctly predicted, frequency 1 94.01% were correctly predicted and frequency 2 84.80% and finally frequency 3 only 40.96% of the observations were correctly predicted.These results highlights the potential of XGBoost as a robust modeling technique for handling big data and accurately predicting insurance claim frequency. The results corroborate with other studies that XGBoost is an invaluable tool for insurance companies.

Keywords: Big data, Frequency, machine learning, ensemble learning, gradient boost, XGBoost


How to Cite

Kollongei, Naomi, and Fredrick Onyango. 2024. “Motor Insurance Claim Frequency Prediction Using XGBoost”. Asian Journal of Probability and Statistics 26 (10):155-70. https://doi.org/10.9734/ajpas/2024/v26i10665.