AI带你省钱旅游!精准预测民宿房源价格!( 六 )


  • 机器学习实战:手把手教你玩转机器学习系列
  • 机器学习实战 | SKLearn入门与简单应用案例
  • 机器学习实战 | SKLearn最全应用指南
线性回归建模def linear_reg(df, test_size=0.3, random_state=42):'''构建模型并返回评估结果输入: 数据dataframe输出: 特征重要度与评估准则(RMSE与R-squared)'''X = df.drop(columns=['price'])y = df[['price']]X_columns = X.columns# 切分训练集与测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = test_size, random_state=random_state)# 线性回归分类器clf = LinearRegression()# 候选参数列表parameters = {'n_jobs': [1, 2, 5, 10, 100],'fit_intercept': [True, False]}# 网格搜索交叉验证调参cv = GridSearchCV(estimator=clf, param_grid=parameters, cv=3, verbose=3)cv.fit(X_train,y_train)# 测试集预估pred = cv.predict(X_test)# 模型评估r2 = r2_score(y_test, pred)mse = mean_squared_error(y_test, pred)rmse = mse **.5# 最佳参数best_par = cv.best_params_coefficients = cv.best_estimator_.coef_#特征重要度importance = np.abs(coefficients)feature_importance = pd.DataFrame(importance, columns=X_columns).T#feature_importance = feature_importance.Tfeature_importance.columns = ['importance']feature_importance = feature_importance.sort_values('importance', ascending=False)print("The model performance for testing set")print("--------------------------------------")print('RMSE is {}'.format(rmse))print('R2 score is {}'.format(r2))print("\n")return feature_importance, rmse, r2 linear_feat_importance, linear_rmse, linear_r2 = linear_reg(model_df)
AI带你省钱旅游!精准预测民宿房源价格!

文章插图
随机森林建模# 随机森林建模def random_forest(df):'''构建模型并返回评估结果输入: 数据dataframe输出: 特征重要度与评估准则(RMSE与R-squared)'''X = df.drop(['price'], axis=1)X_columns = X.columnsy = df['price']X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)# 随机森林模型clf = RandomForestRegressor()# 候选参数parameters = {'n_estimators': [50, 100, 200, 300, 400],'max_depth': [2, 3, 4, 5],'max_depth': [80, 90, 100]}# 网格搜索交叉验证调参cv = GridSearchCV(estimator=clf, param_grid=parameters, cv=5, verbose=3)model = cvmodel.fit(X_train, y_train)# 测试集预估pred = model.predict(X_test)# 模型评估mse = mean_squared_error(y_test, pred)rmse = mse**.5r2 = r2_score(y_test, pred)# 最佳超参数best_par = model.best_params_# 特征重要度r = permutation_importance(model, X_test, y_test,n_repeats=10,random_state=0)perm = pd.DataFrame(columns=['AVG_Importance'], index=[i for i in X_train.columns])perm['AVG_Importance'] = r.importances_meanperm = perm.sort_values(by='AVG_Importance', ascending=False);return rmse, r2, best_par, perm# 运行建模r_forest_rmse, r_forest_r2, r_fores_best_params, r_forest_importance = random_forest(model_df)运行结果如下
Fitting 5 folds for each of 15 candidates, totalling 75 fits[CV 1/5] END ..................max_depth=80, n_estimators=50; total time=2.4s[CV 2/5] END ..................max_depth=80, n_estimators=50; total time=1.9s[CV 3/5] END ..................max_depth=80, n_estimators=50; total time=1.9s[CV 4/5] END ..................max_depth=80, n_estimators=50; total time=1.9s[CV 5/5] END ..................max_depth=80, n_estimators=50; total time=1.9s[CV 1/5] END .................max_depth=80, n_estimators=100; total time=3.8s[CV 2/5] END .................max_depth=80, n_estimators=100; total time=3.8s[CV 3/5] END .................max_depth=80, n_estimators=100; total time=3.9s[CV 4/5] END .................max_depth=80, n_estimators=100; total time=3.8s[CV 5/5] END .................max_depth=80, n_estimators=100; total time=3.8s[CV 1/5] END .................max_depth=80, n_estimators=200; total time=7.5s[CV 2/5] END .................max_depth=80, n_estimators=200; total time=7.7s[CV 3/5] END .................max_depth=80, n_estimators=200; total time=7.7s[CV 4/5] END .................max_depth=80, n_estimators=200; total time=7.6s[CV 5/5] END .................max_depth=80, n_estimators=200; total time=7.6s[CV 1/5] END .................max_depth=80, n_estimators=300; total time=11.3s[CV 2/5] END .................max_depth=80, n_estimators=300; total time=11.4s[CV 3/5] END .................max_depth=80, n_estimators=300; total time=11.7s[CV 4/5] END .................max_depth=80, n_estimators=300; total time=11.4s[CV 5/5] END .................max_depth=80, n_estimators=300; total time=11.4s[CV 1/5] END .................max_depth=80, n_estimators=400; total time=15.1s[CV 2/5] END .................max_depth=80, n_estimators=400; total time=16.4s[CV 3/5] END .................max_depth=80, n_estimators=400; total time=15.6s[CV 4/5] END .................max_depth=80, n_estimators=400; total time=15.2s[CV 5/5] END .................max_depth=80, n_estimators=400; total time=15.6s[CV 1/5] END ..................max_depth=90, n_estimators=50; total time=1.9s[CV 2/5] END ..................max_depth=90, n_estimators=50; total time=1.9s[CV 3/5] END ..................max_depth=90, n_estimators=50; total time=2.0s[CV 4/5] END ..................max_depth=90, n_estimators=50; total time=2.0s[CV 5/5] END ..................max_depth=90, n_estimators=50; total time=2.0s[CV 1/5] END .................max_depth=90, n_estimators=100; total time=3.9s[CV 2/5] END .................max_depth=90, n_estimators=100; total time=3.9s[CV 3/5] END .................max_depth=90, n_estimators=100; total time=4.0s[CV 4/5] END .................max_depth=90, n_estimators=100; total time=3.9s[CV 5/5] END .................max_depth=90, n_estimators=100; total time=3.9s[CV 1/5] END .................max_depth=90, n_estimators=200; total time=8.7s[CV 2/5] END .................max_depth=90, n_estimators=200; total time=8.1s[CV 3/5] END .................max_depth=90, n_estimators=200; total time=8.1s[CV 4/5] END .................max_depth=90, n_estimators=200; total time=7.7s[CV 5/5] END .................max_depth=90, n_estimators=200; total time=8.0s[CV 1/5] END .................max_depth=90, n_estimators=300; total time=11.6s[CV 2/5] END .................max_depth=90, n_estimators=300; total time=11.8s[CV 3/5] END .................max_depth=90, n_estimators=300; total time=12.2s[CV 4/5] END .................max_depth=90, n_estimators=300; total time=12.0s[CV 5/5] END .................max_depth=90, n_estimators=300; total time=13.2s[CV 1/5] END .................max_depth=90, n_estimators=400; total time=15.6s[CV 2/5] END .................max_depth=90, n_estimators=400; total time=15.9s[CV 3/5] END .................max_depth=90, n_estimators=400; total time=16.1s[CV 4/5] END .................max_depth=90, n_estimators=400; total time=15.7s[CV 5/5] END .................max_depth=90, n_estimators=400; total time=15.8s[CV 1/5] END .................max_depth=100, n_estimators=50; total time=1.9s[CV 2/5] END .................max_depth=100, n_estimators=50; total time=2.0s[CV 3/5] END .................max_depth=100, n_estimators=50; total time=2.0s[CV 4/5] END .................max_depth=100, n_estimators=50; total time=2.0s[CV 5/5] END .................max_depth=100, n_estimators=50; total time=2.0s[CV 1/5] END ................max_depth=100, n_estimators=100; total time=4.0s[CV 2/5] END ................max_depth=100, n_estimators=100; total time=4.0s[CV 3/5] END ................max_depth=100, n_estimators=100; total time=4.1s[CV 4/5] END ................max_depth=100, n_estimators=100; total time=4.0s[CV 5/5] END ................max_depth=100, n_estimators=100; total time=4.0s[CV 1/5] END ................max_depth=100, n_estimators=200; total time=7.8s[CV 2/5] END ................max_depth=100, n_estimators=200; total time=7.9s[CV 3/5] END ................max_depth=100, n_estimators=200; total time=8.1s[CV 4/5] END ................max_depth=100, n_estimators=200; total time=7.9s[CV 5/5] END ................max_depth=100, n_estimators=200; total time=7.8s[CV 1/5] END ................max_depth=100, n_estimators=300; total time=11.8s[CV 2/5] END ................max_depth=100, n_estimators=300; total time=12.0s[CV 3/5] END ................max_depth=100, n_estimators=300; total time=12.8s[CV 4/5] END ................max_depth=100, n_estimators=300; total time=11.4s[CV 5/5] END ................max_depth=100, n_estimators=300; total time=11.5s[CV 1/5] END ................max_depth=100, n_estimators=400; total time=15.1s[CV 2/5] END ................max_depth=100, n_estimators=400; total time=15.3s[CV 3/5] END ................max_depth=100, n_estimators=400; total time=15.6s[CV 4/5] END ................max_depth=100, n_estimators=400; total time=15.3s[CV 5/5] END ................max_depth=100, n_estimators=400; total time=15.3s

推荐阅读