Stacking集成学习挑战天池新人赛【工业蒸汽量预测 】 (2) 基础类、交叉验证方法构建
【摘要】 由于后续将使用sklearn库实现大部分的初级学习模型,这里将构建一个sklern基础类,方便代码的使用和拓展。基础类构建class SklearnHelper(object): def __init__(self, clf, seed=0, params=None): params['random_state'] = seed self.clf = clf...
由于后续将使用sklearn库实现大部分的初级学习模型,这里将构建一个sklern基础类,方便代码的使用和拓展。
基础类构建
# 预测结果以mean square error作为评判标准(均方差越小越好)
from sklearn.metrics import mean_squared_error
class SklearnHelper(object):
def __init__(self, clf, seed=0, params=None):
params['random_state'] = seed
self.clf = clf(**params)
def train(self, x_train, y_train, x_val, y_val):
self.clf.fit(x_train, y_train)
y_pre = self.predict(x_val)
return mean_squared_error(y_val, y_pre)
def fit(self, x_train, y_train):
return self.clf.fit(x_train, y_train)
def predict(self, x):
return self.clf.predict(x)
def feature_importances(self):
print(self.clf.feature_importances_)
交叉验证方法构建
from sklearn.model_selection import KFold
def get_oof(clf, x_train, y_train, x_test, n_folds = 5):
"""K-fold stacking"""
num_train, num_test = x_train.shape[0], x_test.shape[0]
oof_train = np.zeros((num_train,))
oof_test = np.zeros((num_test,))
oof_test_all_fold = np.zeros((num_test, n_folds))
scores = []
KF = KFold(n_splits = n_folds, random_state=2017)
for i, (train_index, val_index) in enumerate(KF.split(x_train)):
print('{0} fold, train {1}, val {2}'.format(i, len(train_index), len(val_index)))
x_tra, y_tra = x_train[train_index], y_train[train_index]
x_val, y_val = x_train[val_index], y_train[val_index]
score = clf.train(x_tra, y_tra, x_val, y_val)
scores.append(score)
oof_train[val_index] = clf.predict(x_val)
oof_test_all_fold[:, i] = clf.predict(x_test)
oof_test = np.mean(oof_test_all_fold, axis=1)
print('all scores {0}, average {1}'.format(scores, np.mean(scores)))
return oof_train, oof_test
下一篇介绍初级学习模型构建
【版权声明】本文为华为云社区用户原创内容,未经允许不得转载,如需转载请自行联系原作者进行授权。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)