Let's talk
Don’t like forms?
Thanks! We'll get back to you very soon.
Oops! Something went wrong while submitting the form.
Insights • 6 minutes • Jun 29, 2023

E to the P: Taking Experimental Data Science to Production

Oscar Wong
Oscar Wong
MLOps Engineer

Co-authored by Dr Jon Carlton

Before getting models into production, a lot of experimental data science work takes place; understanding the data; exploring and transforming the data into various formats; and selecting the type of model to use. It’s quite common for that experimental work to take place in notebooks - the favoured tool of the data scientist. That said, the code contained within these notebooks doesn’t usually adhere to the same standards as production-ready code.

Taking these notebooks and turning them into something that can be easily productionised is a challenge, and there’s an active discussion about how best to do this. Here at Fuzzy Labs we’ve developed a way of working and thinking, which we believe will make your life easier when starting on a new project with the ultimate goal of seamlessly deploying models into production. Essentially, adopting a production mindset early on in the experimentation phase will save a great deal of complexity and effort when it comes to deploying the fruits of this experimentation phase. We find this is a good habit that frees up mindshare for experimentation.

Why is it challenging to productionise notebooks?

Notebooks are designed for fast prototyping and experimentation, and allow you to combine code with written text and diagrams. This combination is, coupled with their ease of use, is quite attractive to a lot of Data Scientists, as it allows them to rapidly explore ideas and hypothesise. The drawback, however, is that in a lot of projects, the outcome of this work will be to put models into production. This then results in is a lot of heavy lifting to transform the notebook code into production-ready code. 

Let’s take the simple example of writing a function for downloading data from a storage bucket. On the surface and in an experimental context, writing this function is relatively trivial - it connects to the bucket, finds the right data, and downloads it. But when this function is moved into a production setting, there’s more that needs to be taken into account, for example, what if the bucket or data doesn’t exist? How do you deal with authentication? Ensuring that these functions behave in a deterministic way is crucial in a production system, and testing this in a notebook is challenging (especially when there is a large number of functions defined throughout).

Experimental to Production: what’s our approach?

So, how do you take experimental data science to production? Well, it’s simple really. Here at Fuzzy Labs, we follow an agile approach to projects. This approach to breaking things down into sprints got us thinking, could we apply this approach to developing production-ready MLOps?  

If we take the goal of deploying a recommendation model, we split this process into two sprints: one for experimentation and one for productionisation. With one eye on the productionsation sprint, we engineer our experimentation (fetching data, training and evaluating models, etc.) to be a series of fully tested and generic functions, which are brought together in our experimentation notebooks. 

Digging deeper, the various steps you would normally take to training and evaluating a model are separated out into scripts that have a test suite associated with them.  This enables the scripts - which roughly equate to steps in a pipeline - to be easily lifted and wrapped with pipeline context. By modularizing these functions within our scripts, they can be tested individually to ensure they behave as expected. Additionally, experimental notebooks can be created to test the output of one function as input to another, ensuring that everything works as expected when functions are combined.

In the production sprint, our focus shifts to transitioning the experimental scripts and functions created previously into a production-ready machine learning pipeline. This involves integrating the modular and well-documented components created earlier into a cohesive pipeline using tools like ZenML. Our focus in this sprint is on engineering parts of the pipeline that aren’t included in the experimental sprint, for example, a deployment step for a tool like Seldon Core. Once this is in place, we can then test the pipeline end-to-end to ensure that each component works together seamlessly. 

Once the pipeline is thoroughly tested, it can be deployed to production, enabling the product recommendation model to be used in a live environment. By dividing the project into sprints, we can prioritise specific goals, guarantee that each component is thoroughly tested and well-designed, and gradually work towards a fully functional, production-ready pipeline.

What are the benefits?

Given what we’ve talked about so far, what are the actual benefits to this way of working?


Each function in an experimental script should only do one thing and do it well. Sometimes it’s easy to involve functionality that should be separated or implemented in a different function. Let's say we are building a simple movie recommendation model. We might have an experimental script that scrapes data from different movie websites and formats it into a usable format for model training. Within this script, we might be tempted to write all the code to scrape and parse the data in one large function.

By breaking up the code into smaller, modular functions, we can create a more organised and reusable codebase, we can also test each one separately and also reuse them in other parts of our pipeline.

In addition to creating a more organised and reusable codebase, modular functions also facilitate a test-driven approach to development. By breaking down the code into smaller, independently testable units, we can easily identify and fix errors or bugs during the development process. Furthermore, we can write tests for each function separately, ensuring that it performs its intended task accurately and efficiently. This helps to catch errors before they propagate throughout the codebase, making debugging much easier and less time-consuming. The benefits of this approach extend beyond development to production as well, as it allows for easier maintenance and scalability of the codebase over time.


One of the key principles is to create functions that are generic enough to be imported into a variety of different notebooks. This means designing functions that can be easily configured or adapted to suit different use cases.

One effective way to achieve this is by using function arguments to control the behaviour of the function, rather than hard-coding specific values or assumptions directly into the code. By using arguments, we can make our functions more modular and flexible, allowing us to reuse them in a wider range of situations.

For example, imagine we are creating a function that queries a SQL database. Rather than hard-coding a specific SQL query directly into the function, we could instead pass the query as an argument, allowing us to reuse the same function with different queries depending on our needs. This makes the function more flexible and adaptable, and can save us time and effort in the long run by reducing the need to create multiple similar functions for different queries.

Consistent IO

Another key principle to keep in mind when designing functions for machine learning pipelines is input/output consistency. By insisting on well-defined inputs and outputs, along with their associated types, we can make our functions easier to test and debug. Additionally, this can help other developers to work on scripts that depend on the output of our function concurrently.

For example, imagine we have a function that reads in a CSV file and preprocesses the data for a Machine Learning model. By clearly defining the expected input format (e.g. column names or data types) and output format (e.g. a pandas dataframe or a numpy array), we can ensure that our function is easily reusable and interoperable with other scripts and pipelines.

Furthermore, having consistent input and output can also help us to catch errors and bugs early on.


Using docstrings to provide a clear and concise explanation of each function, along with the expected input and output, can greatly improve the readability and maintainability of our code. Using clear and descriptive names for functions and arguments can make it easier for other developers to understand and reuse your experimental scripts and functions. This can ultimately help us to build our ML project more efficiently. 

Designing machine learning functions with effective collaboration in mind improves the efficiency of the development process. By clearly defining the inputs and outputs of each function and making sure that they are well-documented, any member of the team can work on a specific function independently without waiting for the others to finish their work. This means that anyone can pick up any step from the pipeline, and work on it without following a linear approach, thereby making the overall development process faster and more flexible.

While all of this may appear to be a lot of work upfront, the core point is that by having well-written and tested code earlier on, the transition from experimentation to production becomes smoother, less time-consuming, and we have greater confidence in our outputs. 

Let us know how you tackle this problem and whether this is an approach you might take up!

Share this article