Let's talk
Don’t like forms?
Thanks! We'll get back to you very soon.
Oops! Something went wrong while submitting the form.
AI • 10 minute read • Nov 01, 2022

MLOps Pipeline on the Edge

Chris Norman
Chris Norman
MLOps Engineer

This month at Fuzzy Labs we’re participating in ZenML’s Month of MLOPs competition. The goal is to come up with an interesting machine learning problem and use it to build a production-grade MLOPs pipeline using ZenML. In our previous blog post, we explored all things Data Science in our competition entry. Specifically, we looked at how our model was trained and how it works.

As a recap, we’ve set out to train a model that can play the card game Dobble. In Dobble, players are presented with two cards. Each card has eight symbols, and one of those symbols will appear on both cards; the first player to call out the match wins.

To make it even more interesting, our model will run on the edge, specifically an NVIDIA Jetson Nano. Edge-based machine learning is gaining in popularity, and there are plenty of new challenges that come with it.

This blog is the second in a three-part series:

The data science (the previous): how our model works and how we trained it.

The pipeline (this one): a deep dive into the ZenML pipeline that ties everything together, going from data input to model deployment.

The edge deployment: how we deploy the final model to a Jetson Nano.


Are you a frequent user of Jupyter Notebooks? Have you ever accidentally run notebook cells in the wrong order? We’ve all been there, trying to figure out why our notebooks aren’t working and then pressing ‘Restart & Run All’ in frustration. Only to find out we were running cells in the wrong order. 

Notebooks are great for experimentation but once you move towards deployment or working with more than a couple of people, the cracks start to show: collaboration is tricky and reproducibility is even harder. I’ve been guilty of creating long and confusing notebooks that don’t adhere to software engineering principles. So how can we ensure that we are creating readable, reproducible, reusable and reliable machine learning code?

The introduction and title have already spoiled the answer to this question…


So, what are pipelines and why are they so useful? A pipeline is a set of automated processes that enable code to be compiled, built and deployed reliably and with efficiency. This is even more important in the case of machine learning where reproducibility is essential. ZenML is an incredibly useful tool for implementing machine-learning pipelines; it allows us to split up the machine-learning process into individual, reusable functions - we call these steps. Alongside this, ZenML enables easy integration of other tools such as experiment trackers (for easy collaboration), data validation or model deployment into our pipelines. 

The pipeline brings together (orchestrates) each of these steps and passes data and models between the steps, increasing reliability and reproducibility. 

Think of pipelines as the building foundations of machine learning.

Just like building a house, without foundations, our machine learning models won’t be robust. Machine learning is an incredibly exciting field and it can be easy to jump straight into training a model but the infrastructure must be considered first to avoid headaches in the later stages of development. To start our pipeline development we first considered what we wanted from the pipeline. As stated before, the goal is to have readable, reproducible, reusable, and reliable code. 

From a high level, we needed the following steps in our pipeline

  • Data cleaning
  • Data validation
  • Training
  • Testing and validating the model
  • Deploying the model

These are all general processes in every machine learning project. The next step was to outline a ‘minimum viable product’ which could ingest and format the data, train the model, and finally, deploy this model on the hardware. As steps in a pipeline are heavily dependent on previous steps, this creates a bottleneck. In our case, the initial pipeline development was not as fast as we’d hoped. The goal of having the minimum viable product was to pass this bottleneck as early as possible and allow our team to work on individual features in parallel.

Our pipelines

We took the approach of creating two pipelines:

  1. Data Pipeline - deals with cleaning and preparing data.
  2. Training Pipeline - deals with creating our model.

Finally, as we are deploying the model on an edge device our deployment step must be separate from the ZenML pipelines. 

Minimum viable pipeline.

It is essential to consider how your data and models are passed between the steps and pipelines. Each pipeline has an input and output, these are individually discussed below.

Data flow within our pipelines.

Data Pipeline

Before we could train a model, we required a dataset. Not only did we need a dataset, but we also needed to ensure that our data was cleaned, validated, unbiased, correctly labelled and in the right format for our task. 

We separate out the data processing steps as, fetching, cleaning and validating data becomes redundant if you’re not adding new data. Running these steps every time we train a model would be a waste of time and resources. Therefore for each new dataset, the data pipeline should be run once.

In our case, ‘cleaning’ is not included in our data pipeline; this happens during the manual labelling process on Labelbox. Images are taken, cropped, and bounding boxes are drawn around each Dobble symbol (check out the previous blog for more detail). 

Machine learning models are only as good as your data, meaning that validating your data is important. ZenML includes integrations for data validation tools, such as Deepchecks. This library contains a suite of checks for ‘vision’ data and performs statistical analysis to find outliers within your dataset and checks if your training and testing datasets are representative of one another. These results can then be used to amend and improve your datasets in the future.

We also required the image data to be in Pascal VOC (a specific format for object detection datasets). This conversion happens in its own step. By separating functionality into smaller steps we increased readability, interpretability, and reusability.  Once the data was in the VOC format, we split the dataset into training, testing and validation datasets.

Our data starts as images of Dobble cards which have been labelled using a tool called Labelbox - check out the previous blog for more specifics about this process. The data is then fetched using their API and downloaded locally. This data is then converted into the required format for a Single Shot Multibox Detector (VOC format) and split into train, test and validation datasets. 

The final step in this pipeline uploads the data to an S3 bucket, ready for the training pipeline.

Training Pipeline

We were now ready to train a model (we used PyTorch to do this). Within this pipeline, we needed to collect and download the data from our remote storage bucket, create a PyTorch Dataloader for each dataset, train and evaluate the model, convert the model to ONNX format, and finally, register the experiment and model to be used later.

The first step is downloading the data from the remote storage. This allowed us to run the pipeline either locally or on the cloud as the data is centrally stored. In the future, we could easily replace this with a data version control tool, such as DVC.

After retrieving the data we create PyTorch Dataloaders for training, testing and validation, the next step was to train the model. We use a general pre-trained convolutional neural network: MobileNetV3, which we fine-tune during training for our one-shot object detection task. 

The next step was to evaluate the model using the test dataset. Object detection is a little more complicated for evaluation than standard image classification: we first run inference on each test image and use an intersection over union (IoU) metric, the predicted bounding box is then compared to each real bounding box in the test set. IoU uses the area of overlap from the predicted bounding box and a real bounding box. If the IoU metric is greater than a specified threshold and the prediction label is the same as the real label then we say the object has been classified correctly, if the IoU is over the threshold for another bounding box but the labels do not match we say it has been classified incorrectly. This enables us to calculate accuracy and mean average precision (mAP) metrics to compare models and create a confusion matrix to visualise the classification results.

Image reference

To generalise the model to be used in other applications we convert the PyTorch model to the ONNX format. The ONNX format has the advantage of interoperability: we can use our trained model with other inference frameworks, and as this format makes it easier to access hardware optimisations, it meant we could maximise performance when deploying the model to an edge device.

The final step within the training pipeline is to register the trained model to an experiment tracker alongside the parameters used to train the model and any metrics we obtained from the model. The models we trained can be directly compared with each other this way. ZenML automatically tracks model metadata which can be automatically logged - no more painful manual plotting of loss and accuracy graphs required!

Training on the cloud with ZenML

A key feature of ZenML is the flexibility with which you can run each step. ZenML uses things called stacks. These stacks describe a combination of various MLOps stack components which are configurations of MLOps tools. These tools include artifact stores, experiment trackers, orchestrators and step operators. The orchestrator is responsible for running the machine learning pipelines. The step operator is a useful tool if we want to run a single step in a special environment. For example, if we wanted to run only the training step using a powerful GPU cluster in a cloud environment such as Sagemaker.

Image reference

If we want to run a pipeline on the cloud we might need to provision:

  • Storage buckets
  • Compute clusters
  • Secret managers
  • And other resources

ZenML has premade ‘stack recipes’ that allow easy provisioning of these resources (using Terraform). These stack recipes can also be adapted for your specific pipeline. Once we are finished with the resources we created, ZenML makes it as easy as a single command to delete and clean up the resources created on the cloud. Therefore, as long as we have considered how our data and models are passed between pipelines (e.g. where and how the data is being ingested), moving from running the pipeline locally to on the cloud should only require you to switch the stack that you are using. 

Edge deployment steps

The most common way of deploying a model is on the cloud and giving users access to the model with an API. In our case, we’re deploying to an edge device. This means our deployment steps are quite unique.  

Unfortunately, the ZenML package cannot currently be installed on an ARM64 device. Hence, we will not be using ZenML for the deployment section of this task.

The next blog will discuss in more detail our approach for model deployment on the Nvidia Jetson Nano.

Complete pipeline architecture

With the implementations we discussed before this leads us to a final architecture diagram of the data and training pipelines:

Complete Dobble pipeline architecture diagram.


In this blog we have looked at:

  • The uses of a pipeline tool such as ZenML in creating and deploying reliable machine learning models.
  • How our team leveraged ZenML to create an ML model to play the game Dobble.
  • Some takeaways and key points from our pipeline development stages.

In the third and final blog of this series, we will focus on the specifics of model deployment to an Nvidia Jetson Nano and running inference on the ‘Edge’.

Share this article