Python 实战: 详解银行用户流失预测

项目介绍

这次我们要学习的是银行用户流失预测项目, 首先先来看看数据, 数据分别存放在两个文件中,'Churn-Modelling.csv'里面是训练数据,'Churn-Modelling-Test-Data.csv'里面是测试数据. 下面是数据内容:

数据来源于国外匿名化处理后的真实数据

RowNumber: 行号

CustomerID: 用户编号

Surname: 用户姓名

CreditScore: 信用分数

Geography: 用户所在国家 / 地区

Gender: 用户性别

Age: 年龄

Tenure: 当了本银行多少年用户

Balance: 存贷款情况

NumOfProducts: 使用产品数量

HasCrCard: 是否有本行信用卡

IsActiveMember: 是否活跃用户

EstimatedSalary: 估计收入

Exited: 是否已流失, 这将作为我们的标签数据

首先先载入一些常用模块

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn import neighbors
from sklearn.metrics import classification_report
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder

然后用 numpy 读入数据, 因为数据中有字符串类型的数据, 所以读入数据的时候 dtype 设置为 np.str

train_data = np.genfromtxt('Churn-Modelling.csv' , delimiter=',' , dtype=np.str)
test_data = np.genfromtxt('Churn-Modelling-Test-Data.csv',delimiter=',',dtype=np.str)

数据切分, 表头不需要, 第 0 到第倒数第 2 列为数据, 最后 1 列为标签

x_train = train_data[1:,:-1]
y_train = train_data[1:,-1]
x_test = test_data[1:,:-1]
y_test = test_data[1:,-1]

第 0,1,2 列数据数据分别为编号, ID, 人名, 这三个数据对最后的结果应该影响不大, 所以可以删除掉.

x_train = np.delete(x_train,[0,1,2],axis=1)
x_test = np.delete(x_test,[0,1,2],axis=1)

删除掉 0,1,2 列数据后剩下的 1,2 列数据为国家地区和性别, 都是字符型的数据, 需要转化为数字类型的数据才能构建模型

labelencoder1 = LabelEncoder()
x_train[:,1] = labelencoder1.fit_transform(x_train[:,1])
x_test[:,1] = labelencoder1.transform(x_test[:,1])
labelencoder2 = LabelEncoder()
x_train[:,2] = labelencoder2.fit_transform(x_train[:,2])
x_test[:,2] = labelencoder2.transform(x_test[:,2])

由于读取数据的时候用的是 np.str 类型, 所以训练模型之前要先把 string 类型的数据变成 float 类型

x_train = x_train.astype(np.float32)
x_test = x_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)

然后做数据标准化

sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

构建 KNN 模型并检验测试集结果

knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
predictions = knn.predict(x_test)
print(classification_report(y_test, predictions))
??????????????precision ???recall ?f1-score ??support
?????????0.0 ??????0.80 ?????0.95 ?????0.87 ??????740
?????????1.0 ??????0.69 ?????0.33 ?????0.45 ??????260
???micro avg ??????0.79 ?????0.79 ?????0.79 ?????1000
???macro avg ??????0.75 ?????0.64 ?????0.66 ?????1000
weighted avg ??????0.77 ?????0.79 ?????0.76 ?????1000

构建 MLP 模型并检验测试集结果

mlp = MLPClassifier(hidden_layer_sizes=(20,10) ,max_iter=500)
mlp.fit(x_train,y_train)
predictions = mlp.predict(x_test)
print(classification_report(y_test, predictions))
?????????????? precision ???recall ?f1-score ??support
?????????0.0 ??????0.82 ?????0.96 ?????0.88 ??????740
?????????1.0 ??????0.77 ?????0.38 ?????0.51 ??????260
?????????
???micro avg ??????0.81 ?????0.81 ?????0.81 ?????1000
???macro avg ??????0.79 ?????0.67 ?????0.70 ?????1000
weighted avg ??????0.80 ?????0.81 ?????0.79 ?????1000

项目打包

百度网盘 https://pan.baidu.com/s/18gOPf-pGJ3aq75Txm88yjg

密码: 4t6k

来源: http://www.bubuko.com/infodetail-2883912.html

与本文相关文章

暂无,快来抢沙发吧！