MARRtino face mask classifier

MARRtino face mask classifier 😷

About the project

The goal of this project is training a Neural Network in order to tell apart people that wear a face mask correctly from the ones that either don’t have masks on or that are wearing it incorrectily.

The main components of this project are the jupyter notebook files, which contain the training phase of the model, and the usage and testing of the trained model.

You can find the code and the considerations about this project in the following GitHub repository.

Considerations about the dataset 💽

The dataset used for training my model can be found on Kaggle at this page. It contains images belonging to the following classes:

It contains a total of 8982 photos, equally distributed in 2994 elements per folder.

This makes the dataset very balanced with all the classes represented with the same amount of samples. However, the presence of a class representing people with mask weared incorrectly makes the model less “rigid”, while introducing some noise due to possible ambiguous poses.

The following is a sample of 64 images plotted from the dataset:

Plotted using MatplotlLib

Plotted using MatplotlLib

Choosing the model to be trained 🧠

The model chosen for this task of image classification is ResNet:

The Residual Network, or ResNet for short, is a model that makes use of the residual module.There are several variants of different sizes, including Resnet18, Resnet34, Resnet50, Resnet101, and Resnet152, all of which are available from torchvision models. Here Resnet18 model is used.

When deeper networks are able to start converging, degradation problem occurs. As the network depth increases, the accuracy gets saturated and then degrades rapidly. These degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error. Degradation indicates not all systems are similarly easy to optimize.

Deep residual learning framework address the degradation problem. Instead of hoping each few stacked layers directly fit a desired underlying mapping, these layers are explicitly made to fit a residual mapping.


Training the model with PyTorch 🔥

The framework used to train the model is PyTorch. It takes care of image preprocessing and model training, and my implementation can be found in the face-mask-detection-marrtino.ipynb’ notebook that you can find inside the repository.

During my test I experimented training both a ResNet model that came with its weights pre-trained on the ImageNet dataset and a non pre-trained model based only on the ResNet architecture.

Basically I discovered by accident the concept of transfer learning. In fact the usage of a model which has been pre-trained to a similar problem leads to visibly better result, that I will show you in the next paragraph.

Pre-trained vs plain ResNet18 results 🥊

Both the pre-trained and not pre-trained ResNet18 models showed good results, reaching peaks of accuracy of more than 0.96 in both cases. Moreover the loss in both the Training and Validation phases has decreased almost continously.

That said we can still appreciate a substantial difference when observing the results of the two networks.

The pre-trained one starts with both a very low loss and very high accuracy, and reaches it’s cap already in less than 5 epochs.

The not pre-trained one instead follows a much more organic evolution, starting with a much higher loss and lesser accuracy, and reaching it’s cap between the 5th and the 10th epoch.

Schermata 2022-07-25 alle 00.09.27.png

The plots show here have been created by logging the training data to the Weights&Biases platform. I included better documentation inside the training reports folder, that you can find inside this repository.

Checking the model’s performance 🔍

In order to measure the performance of the obtained model (in this case the not pre-trained one) I computed a confusion matrix testing the predictions on the whole dataset. As you can see from the image below, the results are well distributed on the main diagonal, showing that most of the model predictions are actually correct, with only a small percentage (approx 1%) of incorrect guesses:

Plotted using sklearn, pandas and seaborn

Plotted using sklearn, pandas and seaborn