Author: Shaban, Daila Sherif ./ Title: Using Some Data Mining Approaches with Application on Insurance Data /

Search In this Thesis

العنوان

Using Some Data Mining Approaches with Application on Insurance Data /

المؤلف

Shaban, Daila Sherif .

هيئة الاعداد

باحث / داليا شريف شعبان

مشرف / زهدي محمد نوفل

مشرف / آية شحاتة محمود

مناقش / صلاح مهدي محمد

الموضوع

Data Mining.

تاريخ النشر

2024.

عدد الصفحات

125 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الإحصاء والاحتمالات

تاريخ الإجازة

22/8/2024

مكان الإجازة

جامعة بنها - كلية التجارة - الاحصاء

الفهرس

Only 14 pages are availabe for public view

from

147

from

147

Abstract

Insurance faces many problems but it has various objectives to achieve. Fraud is one of the insurance problems and providing insurance services is one of its objectives. Two types of data are used and analyzed by data mining techniques, as data mining is very important to predict whether fraud or to improve the quality of insurance services provided. This study uses a various of data mining classification techniques to identify and predict the target class (car insurance “the target of the 1st data” and fraud_reported “the target of the second data”). The data was cleaned and pre-processed by removing duplication, filling the missing data, managing the categorical data by label encoding and detecting the outliers. Then the data was split into train and test data. After that, using the standardization feature scaling for the data and using the balance techniques. Finally, the data was evaluated by some data mining models. According to the first data “car insurance data”; the best model that gives the highest results is Random Forest at the undersampling technique. The Random Forest model gives 83.111% accuracy, 83.035% recall, 74.598% precision, 78.591% F1_score and 64.965% MCC. By comparing the proposed new models, we find that the best model of them is GRA with 82.444% accuracy, recall equals 76.724%, precision value is 77.616%, F1_score is 77.167%, MCC value is 62.911% and 81% AUC in the case of soft classifier after applying SMOTE technique. According the insurance fraud detection data, the best model is AdaBoost after applying SMOTE technique. The results of AdaBoost are 92% accuracy, 73.170% recall, 81.081% precision, 76.923% F1_score, 72.238% MCC and 85% AUC. By comparing the proposed new models, we find that the best model of them is AGR. The results of AGR model with hard classifier are 89.333% accuracy, 68.292% recall, 71.794% precision, 70% F1_score, 63.547% MCC and 81% AUC.