Case study

Use case

Flying Blind

Failing to monitor your models is akin to crossing your fingers and hoping for the best.



An introduction to the benefits of model monitoring.

Model monitoring is normally the last thing anybody thinks about when they look to develop a machine learning capability. It’s counter intuitive to think about how to monitor a model that doesn’t exist yet. Why would you put in place model monitoring before you’ve proven that you can even build an effective model from your data? Well, we’re going to tell you.

When developing a model, you’re probably going to focus initially on tools that help your team collaborate effectively. After that you’ll look into automating the deployment of models via a pipeline tool. And typically, only at this stage might you start to think about monitoring the model that you’ve spent so much time and effort in building and deploying.

What is your model doing once you’ve deployed it? How do you know if it is working as expected? Is it producing the right kinds of results? Is it performing quickly enough? Is it even running at all?!

Lots of questions, but by this stage, it’s going to be painful to retrofit a monitoring solution, and perhaps too late!

Picture this.

A fintech startup has built a model that predicts what a business bank balance will be in the next 30 days. They’ve spent a lot of time creating a system that does this, using historical data from the business’ transactions to identify regular transactions, what their frequency and value is in order to give a prediction. The model is retrained every month as new transactions arrive and they query the model every day to give an updated prediction.

This system allows their business to predict their cash flow, help them time when they can pay their suppliers and when they might need a cash injection to avoid running out of money.

So there’s quite a bit going on, regular retraining, deployment and querying of the model. But what happens if the deployed model is broken, how do they know? It might be returning results, but are they the right results? How do they know what the right results should be?

They would need to think about different ways in which the model can go wrong in real life. Suppose when they trained it, they figured out that business bank customers fall into a few categories, e.g. some are seasonal and invoice mostly in the summer, and some have regular transactions all year round. Others are less predictable and have a more variable (and less accurate) cash forecast.

What happens if they encounter a business that doesn’t fit into one of these classifications? What does their model do? How do they know when this has occurred so that they can take some action to retrain a new model for this new classification?

The solution

Unsurprisingly - we would suggest a model monitoring solution is required.

A model monitoring solution that:

  • Tells you if the model has stopped running.
  • Flags that the model is returning results too slowly. 
  • Alerts you if the results are outside of a range of expected values.
  • Lets you know when the input data starts to drift too far from what we used in training.

Monitoring a model isn’t the same as monitoring a web server, or a database. For those, you mainly want to check that they’re still running, and keep an eye on their resources. That’s easy enough.

With machine learning, monitoring needs to be designed in tandem with the model. This makes sense; say you want to check whether a model’s inputs are drifting - then you need a baseline that you can compare against. The baseline comes from the training data, which of course is different for every model.

To make this work, you need to deploy a special metrics-gathering service that sits next to the bank balance prediction model. Think of this as the model’s buddy; it watches everything that goes in and out, and does some calculations to see if anything looks unusual.

An example of something unusual could be if you see bank transactions that don’t fit the assumptions you made during training. Another example is if we predict a bank balance that seems statistically unlikely, e.g. yesterday it was £1,000 but today we predict £10,000,000!

In this scenario, the tool we’d recommend is Alibi Detect, from Seldon (especially if you were already using Seldon to deploy and serve the model). Alibi lets you deploy a detector - that’s the thing which calculates metrics - alongside any model.

We would then plumb all the model metrics into Prometheus, a popular open source monitoring and alerting system. The benefit of Prometheus is it can serve as a central place for all application monitoring, including models, but also web services, databases, etc.

Finally, we would need dashboards. Grafana plays nicely with Prometheus and, because Prometheus has metrics from all our application components, not only models, those dashboards could give us real-time insights into how the entire solution is performing.

Don’t fly blind

With no visibility into what your model is doing in production then you can’t act to fix any problems that occur due to unknown data inputs. Businesses could be given inaccurate financial predictions – lulled into a false sense of security by the poorly performing model. Potentially with catastrophic consequences for their business.

Often bad information is worse than no information, as it fills you with misplaced confidence and encourages you to make poor decisions.

A machine learning solution is only as good as its weakest point. You could have the best algorithm in the world but if it's been trained on data that’s out of date (and you don’t know about it) then you’re in for trouble.

Model Monitoring

A model monitoring solution allows you to act quickly and update your models when they start to drift away from producing sensible results. You don’t want to be in a situation where you’re finding out that your models are no good by having to wait for your users to tell you or (worse) they become disillusioned and stop using your service altogether.

By receiving alerts from the monitoring solution that your results are beginning to drift, your data scientists can start investigating the problem immediately and working on updating the model before it has had a significant impact on your customers.