In this project, I used Python to analyze telcom customer churn prediction. I went through the telcom data. My focus was to process the data for modelling, and try different algorithms to evaluate their performance.
First I analized the features, to try to understand them, and have some insights.
Second, I started to prepare the data for the modelling.
Applied a one-hot-encoding over the cathegorical features. Splitted the data into the train and test sets Standardazing the features on each set.
With the complete train data. For each part I test the same models and algorithms:
Logistic Regression (Scikit-Learn) Multi-Layer Perceptron (Scikit-Learn) Gradient Boosting (Scikit-Learn) Extreme Gradient Boosting (XGBoost) K Nearest Neigbors (KNN) Naive Bayes Decision Trees Support Vector Machines (SVM) Initially, I tested the models performance on the validation set.
By looking at the score metrics and speed performance, the model I would chose is the SVM package. But the XGBoost is close behind.
However, I still believe I can improve the accuracy by applying feature engineering on the data, as well trying other models, even doing an ensemble model over all the tested models.
Here is a link to the complete code on Jupyter notebook Click here.