Ever felt frustrated by a simple card game? Why not cheat with machine learning!
This month at Fuzzy Labs we’re participating in ZenML’s Month of MLOPs competition. The goal is to come up with an interesting machine learning problem, and use it to build a production-grade MLOPs pipeline using ZenML.
We’ve set out to train a model that can play the card game Dobble. In Dobble, players are presented with two cards. Each card has eight symbols, and one of those symbols will appear on both cards; the first player to call out the match wins.
To make it even more interesting, our model will run on the edge, specifically an NVidia Jetson Nano. Edge-based machine learning is gaining in popularity, and there are plenty of new challenges that come with it.
In this blog series we’ll bring you along for the journey, as we build our pipeline, and train and deploy our model. There are three parts to this series:
The data science: how our model works and how we trained it.
The pipeline: a deep dive into the ZenML pipeline that ties everything together, going from data input to model deployment.
The edge deployment: how we deploy the final model to a Jetson Nano.
Labelling the data
There are 57 different symbols that can appear on a Dobble card, with each card having 8 unique symbols. These include symbols such as zebras, spiders, and trees.
So, our starting point was to train a model that can recognise all 57 symbols. To do that, we needed to create a labelled dataset for training. Luckily, we didn’t have to create it from scratch! A dataset of Dobble card images already exists, and usefully, it includes photos taken in different lighting conditions, which improves model generalisability.
But the images weren’t quite suited to our task. While the dataset gives us photos of entire cards, we needed to identify individual symbols. So we needed to label the images by drawing bounding boxes around every single symbol on every single photo. We used LabelBox to help with this (other annotation tools exist), and though labour-intensive — let’s not get into drawing boxes using trackpads 🤯 — we got what we needed, and were ready to move on to training a model.
Sketching a solution
At this point, we had the data in place. This meant that we could begin to sketch out an approach to training the model.
A model should always be designed with the end user experience in mind, so we began by thinking this through carefully. We imagined our user placing two Dobble cards on a desk, with a camera looking at them from above. That being the case, we knew we’d need to first identify and isolate each card, before passing each card to the model. The model would then need to tell us what symbols it can see, and from that we can find the matching symbol.
Isolating the cards can be done with standard computer vision techniques, mostly contour-finding, with no machine learning needed. The real magic will come from the object recognition model, which will need to output a set of symbols. We’ll have two symbol sets, one for each card, and that means the last step is to find the set intersection, or in other words, find the symbols that are common between the cards. And hopefully there will only be one common symbol!
Packaging the data
Unlike image classification models, whose training data tends to be pretty straightforward, the data for an object detection model needs special consideration. For every Dobble card image, we’ll have a collection of bounding boxes that encode the coordinates of each symbol. Additionally, we’ll have textual labels that tell us what those symbols are, e.g. zebra, banana, cactus and so forth.
We packaged our training data into a format called Visual Object Classes (VOC). Actually, it’s Pascal VOC (no, not the programming language!). The VOC format stores bounding polygons plus labels alongside each image. We used ZenML to create a data preparation pipeline that outputs the dataset in this format, to be picked up by the training pipeline later.
Training a model
With our approach sketched out, we could start to iterate on the model. At a high-level, we needed to train a model to detect, locate, and classify each of the symbols on a Dobble card, and it would need to work in real-time against a video feed.
There are a number of commonly used methods for real-time object recognition, like YOLO (You Only Look Once) and SSD (Single Shot Multibox detector). Both of these work by taking a single look at an image, and using that one look to find all the objects that it can recognise. One of the big factors distinguishing them is performance: YOLO can handle very high video frame rates, but often with less accuracy. In our case, high frame rate isn’t a priority, as we only need a couple of frames per second. On the other hand we do care more about accuracy. For these reasons we decided to take the SSD approach.
As is common practice with vision models, we’re using a pre-trained model to jump start our training. The SSD model architecture includes a backbone model, which in our case is MobileNet v1, and a head, which is a set of additional convolutional layers that will be trained on our own data.
The choice of pre-trained base model is strongly tied to the fact that we’re going to deploy to the Jetson Nano, where we have limited memory in which to run the model — the Nano has a total of 2GB, shared between the CPU and GPU. MobileNet is designed specifically for Edge and mobile applications, so it’s a sensible choice here.
It takes a lot of experimenting to come up with a model that performs well. This is where ZenML’s pipelines become really useful. ZenML allowed us to easily train and evaluate model variations by simply changing configuration files, without needing to touch the underlying code. For example, by altering the split between train, validation, and test datasets, and experimenting with different learning rates and the number of epochs, amongst a range of others.
Training is also compute-intensive, and so often we want to use cloud-based resources to get scalable compute on-demand. But deploying and managing cloud resources isn’t something data scientists typically want to get involved with. ZenML have recently developed stack recipes which use Terraform to help you deploy a full MLOps stack in minutes. Because of this, we didn’t have to worry about manually setting up any of the infrastructure ourselves, saving time and simplifying the process, so we could focus on the data science problems instead.
In this article we covered our data science journey to create a model which has learnt to play Dobble. We looked at how we created a ground truth dataset, sketched out our solution, and the training of a model. We also showed how ZenML has aided us throughout this entire process.
Stay tuned for the next blog post where we’ll cover how the entire pipeline, from data ingestion to deployment in detail!
Co-authored with Matt Squire