- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

Stacking集成学习挑战天池新人赛【工业蒸汽量预测】(5) 参数调优

地上一只鹅~ 发表于 2018/12/30 15:06:38 2018/12/30

【摘要】之前的文章已经介绍了stacking集成学习的初级学习模型和次级学习模型的构建，并且生成了蒸汽量的预测数据。下面将介绍之前用到的学习模型的参数调优，利用sklearn提供的GridSearchCV方法对RandomForestRegressor、GradientBoostingRegressor、xgboost的参数调优。由于时间关系下面只做简单实例XGBRegressorfrom skle...

之前的文章已经介绍了stacking集成学习的初级学习模型和次级学习模型的构建，并且生成了蒸汽量的预测数据。下面将介绍之前用到的学习模型的参数调优，利用sklearn提供的GridSearchCV方法对RandomForestRegressor、GradientBoostingRegressor、xgboost的参数调优。由于时间关系下面只做简单示例

XGBRegressor

from sklearn.model_selection import GridSearchCV

cv_params = {'n_estimators': range(100, 200, 20), "max_depth": range(5, 10), "learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7,0.9], "min_child_weight": range(5, 10)}
other_params = {'learning_rate': 0.3, 'n_estimators': 20, 'max_depth': 8, 'min_child_weight': 6, 'seed': 0,
                    'subsample': 0.9, 'colsample_bytree': 0.9, 'gamma': 0.6, 'reg_alpha': 0.05, 'reg_lambda': 0.1}
xg_model = xgb.XGBRegressor(**other_params)
optimized_xgb = GridSearchCV(estimator=xg_model, param_grid=cv_params, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
optimized_xgb.fit(train_x, train_y)
evalute_result = optimized_xgb.cv_results_
best_params_ = '参数的最佳取值：{0}'.format(optimized_xgb.best_params_)
best_score_ = '最佳模型得分:{0}'.format(optimized_xgb.best_score_)
print(evalute_result)
print(best_params_)
print(best_score_)

RandomForestRegressor

cv_params = {'n_estimators': range(100, 200, 20), "max_depth": range(5, 10), "min_samples_split": [0.1, 0.3, 0.4, 0.5, 0.7,0.9], }
other_params = { 'n_estimators': 20, 'max_depth': 8, 'min_samples_split': 0.1, "min_samples_leaf": 1, "max_features": 'auto'}
rf_model = RandomForestRegressor(**other_params)
optimized_rf = GridSearchCV(estimator=rf_model, param_grid=cv_params, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
optimized_rf.fit(train_x, train_y)
evalute_result = optimized_rf.cv_results_
best_params_ = '参数的最佳取值：{0}'.format(optimized_rf.best_params_)
best_score_ = '最佳模型得分:{0}'.format(optimized_rf.best_score_)
print(best_params_)
print(best_score_)

GradientBoostingRegressor

cv_params = {'n_estimators': range(100, 200, 20), "max_depth": range(5, 10), "min_samples_split": [0.1, 0.3, 0.4, 0.5, 0.7,0.9], }
other_params = { 'n_estimators': 20, 'max_depth': 8, 'min_samples_split': 0.1, "min_samples_leaf": 1, "max_features": 'auto'}
gb_model = GradientBoostingRegressor(**other_params)
optimized_gb = GridSearchCV(estimator=gb_model, param_grid=cv_params, scoring='neg_mean_squared_error', cv=5, verbose=1, n_jobs=4)
optimized_gb.fit(train_x, train_y)
evalute_result = optimized_gb.cv_results_
best_params_ = '参数的最佳取值：{0}'.format(optimized_gb.best_params_)
best_score_ = '最佳模型得分:{0}'.format(optimized_gb.best_score_)
print(best_params_)
print(best_score_)

次级学习模型构建

用参数调优过的初级学习模型生成次级学习模型的训练数据和测试数据

rf_oof_train, rf_oof_test = get_oof(rf_model, train_x, train_y, test_x)
gb_oof_train, gb_oof_test = get_oof(gb_model, train_x, train_y, test_x)
xgb_oof_train, xgb_oof_test = get_oof(xgb_model, train_x, train_y, test_x)

input_train = [rf_oof_train, gb_oof_train, xgb_oof_train] 
input_test = [rf_oof_test, gb_oof_test, xgb_oof_test]

stacked_train = np.concatenate([f.reshape(-1, 1) for f in input_train], axis=1)
stacked_test = np.concatenate([f.reshape(-1, 1) for f in input_test], axis=1)

final_model = LinearRegression()
final_model.fit(stacked_train, train_y)
test_prediction = final_model.predict(stacked_test)
mean_squared_error(test_y, test_prediction)

模型效果比调优之前好一点，但效果不明显，后面会介绍针对特征数据进行的调优。

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

Stacking集成学习挑战天池新人赛【工业蒸汽量预测】(5) 参数调优

XGBRegressor

RandomForestRegressor

GradientBoostingRegressor

次级学习模型构建

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

Stacking集成学习挑战天池新人赛【工业蒸汽量预测 】(5) 参数调优

XGBRegressor

RandomForestRegressor

GradientBoostingRegressor

次级学习模型构建

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品

Stacking集成学习挑战天池新人赛【工业蒸汽量预测】(5) 参数调优