Modelni tanlash va o'zaro tekshirish strategiyasi.
Biz logistik regressiya modelimizni o'rgatish uchun poezd ma'lumotlaridan va model uchun regulyarizatsiya parametrini tanlash uchun tekshirish ma'lumotlari sifatida o'quv ma'lumotlarining bir qismidan foydalanamiz. Tasdiqlash ko'rsatkichlari asosida modelimizni tanlagandan so'ng, biz sinov ma'lumotlarining ishlashini tekshiramiz.
Tabaqalashtirilgan k-fold xoch tekshiruvi har bir katlamda sinf taqsimotini saqlab, poezd va sinov/tasdiqlash ma'lumotlarining burmalarini yaratadi, shuning uchun har bir katlam aholining sinf taqsimotini yoki bu holda o'quv ma'lumotlarini aks ettiradi.
# Tabaqalashtirilgan aniqlang k-barobar xoch tekshirish ob'ekt
K = 10 # xoch tekshirish
StratifiedShuffleSplit uchun to'ni (n_splits = K, random_state = 489567)
Funktsiyalarni aniqlang va ularni aniqlash.
Biz logistik regressiya modelimizni o'rgatish uchun poezd ma'lumotlaridan va model uchun regulyarizatsiya parametrini tanlash uchun tekshirish ma'lumotlari sifatida o'quv ma'lumotlarining bir qismidan foydalanamiz. Tasdiqlash ko'rsatkichlari asosida modelimizni tanlagandan so'ng, biz sinov ma'lumotlarining ishlashini tekshiramiz.
Tabaqalashtirilgan k-fold xoch tekshiruvi har bir katlamda sinf taqsimotini saqlab, poezd va sinov/tasdiqlash ma'lumotlarining burmalarini yaratadi, shuning uchun har bir katlam aholining sinf taqsimotini yoki bu holda o'quv ma'lumotlarini aks ettiradi.
# Tabaqalashtirilgan aniqlang k-barobar xoch tekshirish ob'ekt
K = 10 # xoch tekshirish
StratifiedShuffleSplit uchun to'ni (n_splits = K, random_state = 489567)
Funktsiyalarni aniqlash
def plot_val_curve(train_scores, val_scores, param_range, plt_title):
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
val_scores_mean = np.mean(val_scores, axis=1)
val_scores_std = np.std(val_scores, axis=1)
plt.figure(figsize=(14,6))
plt.title(plt_title)
plt.xlabel("$C-Regularization parameter$")
plt.ylabel("Accuracy")
lw = 2
plt.semilogx(param_range, train_scores_mean, label="Training score",
color="darkorange", lw=lw)
plt.fill_between(param_range, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.2,
color="darkorange", lw=lw)
plt.semilogx(param_range, val_scores_mean, label="Cross-validation score",
color="navy", lw=lw)
plt.fill_between(param_range, val_scores_mean - val_scores_std,
val_scores_mean + val_scores_std, alpha=0.2,
color="navy", lw=lw)
plt.legend(loc="best")
def plot_learning_curve(X,y,clf_estimator, cv_estimator, scorer, xlabel=''):
train_x_axis, train_scores, test_scores =learning_curve(estimator=clf_estimator,
X=X,
y=y,
train_sizes=np.linspace(0.1, 1.0, 10),
cv=cv_estimator,
scoring=scorer,
exploit_incremental_learning=False,
n_jobs=-1)
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)
plt.plot(train_x_axis, train_mean,
color='blue', marker='o',
markersize=5, label='training accuracy')
plt.fill_between(train_x_axis,
train_mean + train_std,
train_mean - train_std,
alpha=0.15, color='blue')
plt.plot(train_x_axis, test_mean,
color='green', line,
marker='s', markersize=5,
label='validation accuracy')
plt.fill_between(train_x_axis,
test_mean + test_std,
test_mean - test_std,
alpha=0.15, color='green')
plt.grid()
plt.xlabel(xlabel)
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.tight_layout()
|