Implementing Neural Network with Callbacks

April 6, 2024

A callback is an object that can perform certain actions at various stages of training. It is typically used to confirm or adjust specific behaviors and is called periodically throughout a procedure. Callbacks can be used in machine learning to specify what occurs before, during, or after a training epoch or a single batch. In this note, we implement EarlyStopping, LearningRateScheduler, and ModelCheckpoint callbacks in the training of a neural network.

○ Data

In this note, we use the MNIST dataset and consider the problem of recognizing handwritten digits. There are multiple ways to load the dataset. The keras.datasets module provides it readily.

from keras.datasets import mnist
data = mnist.load_data()

A good place to understand the data is the Keras documentation on it. Taking a quick look at the documentation page, we understand that the loaded data contains sixty thousand \(28 \times 28\) grayscale images of the \(10\) digits, along with a test set of ten thousand images.

We also see from the documentation that the data is in the format of a tuple of NumPy arrays: (X_train, y_train), (X_test, y_test).

(X_train, y_train), (X_test, y_test) = data

We check the shape of the arrays.

○ Visualization

To visualize the handwritten digits from the raw data, we use the imshow function from the matplotlib.pyplot module.

import matplotlib.pyplot as plt
num_images = 25
fig, axes = plt.subplots(5, 5, figsize = (5, 5))
axes = axes.flatten()
for i in range(num_images):
    axes[i].imshow(X_train[i], cmap = 'gray')
    axes[i].axis('off')
    axes[i].set_title(f"Label: {y_train[i]}", fontsize = 10)
plt.tight_layout()
plt.show()

○ Feature Unrolling

To train a neural network on the data, we need to unroll the pixel values to a one-dimensional array. For this reason, we reshape X_train and X_test such that the \(28 \times 28\) greyscale pixel values corresponding to each observation becomes an array of \(784\) values.

import numpy as np
X_train_unrolled = X_train.reshape(X_train.shape[0], np.prod(X_train.shape[1:]))
X_test_unrolled = X_test.reshape(X_test.shape[0], np.prod(X_test.shape[1:]))

We check the shape of X_train_unrolled and X_test_unrolled.

○ Target Encoding

The target variable can assume \(10\) different labels. To train a neural network, we need one-hot encode it.

def vec_to_mat(y_in, num_classes = None, dtype = 'float32'):
    y_in = np.array(y_in, dtype = 'int')
    if not num_classes:
        num_classes = np.max(y_in) + 1
    m = y_in.shape[0]
    Y_out = np.zeros((m, num_classes), dtype = dtype)
    Y_out[np.arange(m), y_in] = 1
    return Y_out

Y_train_arr = vec_to_mat(y_train, num_classes = 10, dtype = 'int')
Y_test_arr = vec_to_mat(y_test, num_classes = 10, dtype = 'int')

We check the shape of Y_train_arr and Y_test_arr.

○ Model Creation

We begin by adding dense layers, with appropriate number of units and activation function, to a sequential model.

from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(units = 128, input_dim = X_train_unrolled.shape[1], activation = 'relu'))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dense(units = 16, activation = 'relu'))
model.add(Dense(units = 10, activation = 'softmax'))
model.summary()

○ Model Compilation

We compile the model with binary cross-entropy loss and Adam optimizer with an input learning rate. The model is evaluated with the accuracy metric during the training process.

model.compile(
    loss = 'binary_crossentropy',
    optimizer = 'adam',
    metrics = ['accuracy']
)

Note that the Adam optimizer is set with a learning rate of \(0.001\) by default. To experiment with the learning rate parameter, we can compile the model as follows.

from keras.optimizers import Adam
model.compile(
    loss = 'binary_crossentropy',
    optimizer = Adam(learning_rate = 0.001),
    metrics = ['accuracy']
)

○ EarlyStopping Callback

The EarlyStopping callback stops the training process when a given monitored metric stops improving by a certain amount for a certain number of epochs, starting from a given epoch.

from keras.callbacks import EarlyStopping
earlystop = EarlyStopping(
    monitor = 'val_loss',
    min_delta = 0.001,
    patience = 20,
    verbose = 0,
    mode = 'auto',
    start_from_epoch = 60
)

○ LearningRateScheduler Callback

The LearningRateScheduler callback updates the learning rate value at the beginning of every epoch from an input schedule function with the current epoch and current learning rate. It then applies the updated learning rate to the optimizer.

The modified linear schedule function holds the learning rate constant for a certain number of epochs and then diminishes it linearly.

def scheduler_modified_linear(epoch, learning_rate):
    if epoch < 40:
        return learning_rate
    else:
        return learning_rate * (100 - epoch) / (100 - epoch + 1)

The modified exponential schedule function holds the learning rate constant for a certain number of epochs and then diminishes it exponentially.

import math
def scheduler_modified_exponential(epoch, learning_rate):
    if epoch < 40:
        return learning_rate
    else:
        return learning_rate * math.exp(-0.1)

In this note, we use the modified linear schedule function.

from keras.callbacks import LearningRateScheduler
learning_rate_scheduler = LearningRateScheduler(scheduler_modified_linear)

○ ModelCheckpoint Callback

The ModelCheckpoint callback saves the Keras model or model weights at some frequency. The model or weights can be loaded later for inference or to continue the training from the saved instance.

from keras.callbacks import ModelCheckpoint
model_checkpoint = ModelCheckpoint(
    filepath = 'best_model.h5',
    monitor = 'val_loss',
    save_best_only = True
)

○ Training

We incorporate the three callbacks in the model.fit() function and train the model on (X_train_unrolled, Y_train_arr).

history = model.fit(
    x = X_train_unrolled,
    y = Y_train_arr,
    batch_size = 32,
    epochs = 100,
    validation_split = 0.2,
    callbacks = [earlystop, learning_rate_scheduler, model_checkpoint],
    verbose = 0
)

Note: If we try to train the model on (X_train_unrolled, y_train), we would get the following error message:

ValueError: `logits` and `labels` must have the same shape, received ((32, 10) vs (32, 1)).

Now, we plot the model loss (binary crossentropy) against the epochs.

import seaborn as sns
plt.figure(figsize = (15, 10))
plt.title('Model loss (binary crossentropy)', fontsize = 14)
sns.lineplot(data = history.history['loss'], label = 'Train')
sns.lineplot(data = history.history['val_loss'], label = 'Validation')
plt.xlabel('Epoch', fontsize = 14)
plt.ylabel('Loss', fontsize = 14)
plt.legend()
plt.show()

We also plot the model accuracy against the epochs.

plt.figure(figsize = (15, 10))
plt.title('Model accuracy', fontsize = 14)
sns.lineplot(data = history.history['accuracy'], label = 'Train')
sns.lineplot(data = history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch', fontsize = 14)
plt.ylabel('Accuracy', fontsize = 14)
plt.legend()
plt.show()

○ Inference

We load the model instance with best validation loss.

model.load_weights('best_model.h5')

Next, we use this model instance to predict the test class probabilities based on the unrolled test features X_test_unrolled.

y_pred_proba = model.predict(X_test_unrolled, verbose = 0)

Each row of y_pred_proba gives a vector of predicted probabilities, the \(i\)th element of which is the probability that the test observation belongs to the \(i\)th target class. We convert the probability vector to a label by taking the class with the highest probability of having the test observation.

y_pred = [np.argmax(y_pred_proba[i]) for i in range(len(y_pred_proba))]

○ Evaluation

The sklearn.metrics module provides a number of measures to evaluate models. We import the accuracy_score metric and feed the true test labels y_test and the predicted test labels y_pred to it.

from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

We obtain an accuracy score of \(0.9733\). Next, we compute and print the confusion matrix depicting the model performance on each class.

import pandas as pd
import seaborn as sns
from sklearn import metrics
def conf_mat(y_test, y_pred, num_class, figsize = (10, 8), font_scale = 1.2, annot_kws_size = 16):
    class_names = np.arange(num_class)
    tick_marks_y = np.arange(num_class) + 0.5
    tick_marks_x = np.arange(num_class) + 0.5
    confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
    confusion_matrix_df = pd.DataFrame(confusion_matrix, range(num_class), range(num_class))
    plt.figure(figsize = figsize)
    sns.set(font_scale = font_scale) # label size
    plt.title("Confusion Matrix")
    sns.heatmap(confusion_matrix_df, annot = True, annot_kws = {"size": annot_kws_size}, fmt = 'd') # font size
    plt.yticks(tick_marks_y, class_names, rotation = 'vertical')
    plt.xticks(tick_marks_x, class_names, rotation = 'horizontal')
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.grid(False)
    plt.show()

conf_mat(y_test, y_pred, num_class = 10, figsize = (15, 12), font_scale = 1.2, annot_kws_size = 16)