机器学习之模型调优
交叉验证(Corss Validate)
将拿到的训练数据,分为训练和验证集。每次都更换不同的验证集,取平均值作为最终结果。
超参数调优-网格搜索(Grid Search)
通常情况下需要手动指定的参数(例如K-近邻算法中的K值)叫做超参数。需要对模型预设几种超参数组合,每组超参数都采用交叉验证来进行评估。最后选出最优参数组合建立模型。
API
- sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
- estimator: 估计器对象
- param_grid: 估计器参数,字典。
- {‘n_neighbors’:[1,3,5]}
- cv: 交叉验证的折数
代码示例:使用交叉验证和网格搜索优化KNN算法中k值
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
iris = load_iris()
x_train, x_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=6)
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
estimator = KNeighborsClassifier()
# 参数字典
params = {'n_neighbors':[1,3,5,7]}
# 网格搜索交叉验证
estimator = GridSearchCV(estimator, param_grid=params, cv=10)
estimator.fit(x_train, y_train)
y_predict = estimator.predict(x_test)
score = estimator.score(x_test, y_test)
print("score=", score)
print(estimator.best_params_,estimator.best_score_,estimator.best_estimator_)