Telecom Churn Analysis

7 min readJul 15, 2020

Project: Predicting churn for a telecom company so it can can effectively focus a customer retention marketing program (e.g. a special offer) or improve certain aspects based on the model to the subset of clients which are most likely to change their carrier. Therefore, the “churn” column is chosen as target and the following predictive analysis is a supervised classification problem.

What is Churn?

The churn rate is the percentage of subscribers to a service who discontinue their subscriptions to the service within a given time period.

For a company to expand its clientele, its growth rate, as measured by the number of new customers, must exceed its churn rate.

Why is churn so important?

Churn is important because it directly affects your service’s profitability. It is common to assume that the profitability of a service is directly related to the growth of its customer base. That might lead business owners to imply that in order to grow their customer base, the rate of acquiring new customers must exceed the churn rate.

Data: Churn in telecom’s dataset which can be found here — https://www.kaggle.com/becksddf/churn-in-telecoms-dataset

Columns and datatypes:

state                      object
account length              int64
area code                   int64
phone number               object
international plan         object
voice mail plan            object
number vmail messages       int64
total day minutes         float64
total day calls             int64
total day charge          float64
total eve minutes         float64
total eve calls             int64
total eve charge          float64
total night minutes       float64
total night calls           int64
total night charge        float64
total intl minutes        float64
total intl calls            int64
total intl charge         float64
customer service calls      int64
churn                        bool

Data Cleaning : We can see that the columns “state”, “international plan”, “voice mail plan” and “churn” have String values. The latter three seem to have just the values “yes” or “no” and are therefore converted to 1 and 0 respectively.

The “state” column is converted using the LabelEncoder, which replaces each unique label with a unique integer. In this case, a label encode is used instead of dummy variables because of the many distinct values,.

The “phone number” column is removed, because every customer has its own phone number.

df = df.drop([“phone_number”], axis=1)

preprocess_data function:

def preprocess_data(df):
 pre_df = df.copy()
 
 # Replace the spaces in the column names with underscores
 pre_df.columns = [s.replace(“ “, “_”) for s in pre_df.columns]
 
 # convert string columns to integers
 pre_df[“international_plan”] = pre_df[“international_plan”].apply(lambda x: 0 if x==”no” else 1)
 pre_df[“voice_mail_plan”] = pre_df[“voice_mail_plan”].apply(lambda x: 0 if x==”no” else 1)
 le = LabelEncoder()
 le.fit(pre_df[‘state’])
 pre_df[‘state’] = le.transform(pre_df[‘state’])
 
 return pre_df, le

Checking value counts for churn:

pre_df[‘churn’].value_counts()False    2850
True      483

We can see that we have clearly more samples for customers without churn than for customers with churn. So we have a class imbalance for the target variable which could lead to predictive models which are biased towards the majority (i.e. no churn). In order to deal with this issue we will investigate into the use of oversampling when building the models.

EDA analysis with questions:

Q1) Do we see an increase in churn rate with more customer service calls?

We see an increase in customer churn rate in increasing number of customer service calls made by the customer.
Some might be repeatedly calling for a fix and switched carriers when their issue was not fixed.
Was there an issue with service quality, coverage etc?

It will be interesting to investigate the data on what type of issues the customers called in for in the customer service calls.

Q2) How does account length matter?

Significant considerable maximum churn is observed in account length bracket of 75 to about 100 weeks.

Q3) Monthly/Yearly Charge vs Churn Rate

In this we see that the customers who churned were paying more per month then the customers who did not churn. They were paying about 10–15 dollars extra a month than the customers who did not churn. Churn rate increased with increase in Total Charge in a day. Approximately they paid around 100–150 more than the customers who did not churn.

Q4) How does international plan affect churn rate?

We can see only a few customers with international plan. But in those few, there is a significant churn rate — slightly less than 50%.

In the data we were given, that may imply that the customers who left were not happy with the international plan charges.

Model:

The following models were tried with this dataset:

LogisticRegression
XGBClassifier
MultinomialNB
AdaBoostClassifier
KNeighborsClassifier
GradientBoostingClassifier
ExtraTreesClassifier
DecisionTreeClassifier

Out of all these the best performers were GradientBoostingClassifier and XGBClassifier.

Let us compare the models before and after we use Synthetic Minority Over-sampling Technique (SMOTE).

Note — We do smote only on our training data

Gradient Boosting Classifier:

# Gradient Boosting Classifier without smoteclf = GradientBoostingClassifier()
clf.fit(xtrain, ytrain)
y_test_preds = clf.predict(xtest)report = classification_report(ytest, y_test_preds)
print(report)               precision    recall  f1-score   support

       False       0.96      0.98      0.97       713
        True       0.89      0.74      0.81       121

    accuracy                           0.95       834
   macro avg       0.92      0.86      0.89       834
weighted avg       0.95      0.95      0.95       834# Gradient Boosting Classifier with smoteclf = GradientBoostingClassifier()
clf.fit(X_train_smote, y_train_smote)
y_test_pred_smote = clf.predict(xtest)
print(“accuracy_score”,accuracy_score(ytest, y_test_pred_smote))
print(“auc”,roc_auc_score(ytest, y_test_pred_smote))accuracy_score 0.8920863309352518
auc 0.8099405375957714report = classification_report(ytest, y_test_pred_smote)
print(report)               precision    recall  f1-score   support

       False       0.95      0.93      0.94       713
        True       0.61      0.69      0.65       121

    accuracy                           0.89       834
   macro avg       0.78      0.81      0.79       834
weighted avg       0.90      0.89      0.89       834

XGBoost Classifier:

# XGBoost without smoteclf = xgb.XGBClassifier(max_depth=7, n_estimators=200, colsample_bytree=0.8, 
subsample=0.8, nthread=10, learning_rate=0.1)
clf.fit(xtrain, ytrain)
pred = clf.predict(xtest)
print(“accuracy_score”,accuracy_score(ytest, pred))
print(“auc”,roc_auc_score(ytest, pred))
xg = [clf.__class__,accuracy_score(ytest, pred),roc_auc_score(ytest, pred)]accuracy_score 0.947242206235012
auc 0.8593534477762451# XGBoost with smoteclf_xg = xgb.XGBClassifier(max_depth=7, n_estimators=200, colsample_bytree=0.8, 
                        subsample=0.8, nthread=10, learning_rate=0.1)
clf_xg.fit(xtrain, ytrain)
y_test_pred_smote_xg = clf_xg.predict(xtest)
print("accuracy_score",accuracy_score(ytest, y_test_pred_smote_xg))
print("auc",roc_auc_score(ytest, y_test_pred_smote_xg))accuracy_score 0.947242206235012
auc 0.8593534477762451

I chose XGBoost Classifier as my final model and here are the feature importances of the model.

feat_importances = pd.Series(clf_xg.feature_importances_, index=x.columns)
feat_importances.nlargest(20).plot(kind=’barh’, figsize=(13,8))

From these feature importances i took the 4 most important features which are:

International Plan
Voice Mail Plan
Customer Service Calls
Total Day Charge

It would be great to know every single customer who will churn, but how much insight would that information really bring? And how would you know what to focus on if you wanted to keep them and how much you could spend to keep them before having them as a customer turned into a loss?

Recommendations based on the model:

1) One of the most important predictors for the Model is the number of customer service calls. This might imply that the company should improve its customer service and solve the issues of customers calling in repeatedly.

2) Another important feature is the total day minutes and the total day charge. The company could try to either lower its charge per minute for clients, which have many day minutes or it could offer flat rates for calls.

3) I would suggest the Telecom company to present special offers according to the customers life-time value(Account-Length).

4) Follow-up with customers using international plan since almost 50% of the total customers with the plan left. Investigate the reason.

Future Work:

Get data on payment methods.
See if churn rate is affected by special promotions of other carriers.
Internet plans.
Contract based plans ( Month-to-Month, One Year, Two Year)
Type of issue information in customer service calls.
Survival Analysis ( The Cox Proportional Hazards Model ) — The Cox PH Model is a regression based model that analyses the features of a dataset with regards to how long a or customer survives. It is called the Proportional Hazards model as it describes how each feature proportionally increases risk over the baseline survival rate for a cohort.

Telecom Churn Analysis

Written by Saif Kasmani

No responses yet