We’re on a journey to build MindGPT, a mental health conversational system where people can ask questions and receive answers summarised from two leading mental health websites. The first part of that journey was building a thin slice of the whole system, enabling us to assess our approach, uncover places for more in-depth thought, and most importantly, have a foundation to build from.
In this blog, we’ll show MindGPT in action, demonstrating how we’ve used in-context learning combined with our datasets, to get answers from our base model.
What’s in the first version?
MindGPT is made up of four core components, which we’ll cover in a bit more detail here, and runs on Azure infrastructure provisioned by our Matcha tool. Matcha provisions and provides access to an infrastructure stack to run pipelines and deploy services to, for example, deploying the vector database as a service on Kubernetes alongside the deployed LLM model.
This first end-to-end version is composed of the following:
There’s two data pipelines: one to scrape the data from both the NHS Mental Health and Mind websites and another to prepare the data for inference and perform validation. The difference now is that we have data version control, which we’ll talk about in more detail in a later blog post, but in summary: this allows us to have full data provenance for any run of the pipeline(s). That means we can trace all the way back from inference which specific data points (scraped from the websites) were used which is great for the openness of data usage.
Words can’t be used directly as input into LLMs, they need to be converted to a format that the model understands. We do this by embedding words and sentences, this just means that they’re converted to a semantically meaningful numerical representation.
We store these embeddings in a vector database, which is a special type of database that allows for efficient similarity querying. Once we have a set of words that are similar, we use them as context for our model. If you’re thinking, “semantically meaningful what?”, this process is discussed in much more detail in MindGPT: Vector Databases.
Deploying LLMs can be a challenge (unless you’re one of the big tech players) as you’re maintaining a balance between quality and size, with larger models typically giving better results but at a high infrastructure cost. As this is our first version of MindGPT, we’ve kept the model small and we’ve used the small version of Flan T5.
To deploy this model, we’re making use of Seldon which supports the deployment of HuggingFace transformer models to Kubernetes and a tool which we have plenty of experience with. We’ve kept this simple, and now that we have a first end-to-end version of MindGPT deployed, we’re planning to improve this through an increase in model size along with parameterisation, amongst other experiments.
As always, expect a more detailed blog on this coming soon!
MindGPT wouldn’t be anything without a user interface - we can’t expect everyone to send `curl` requests to the model! We’re using Streamlit as our interface, which provides a simple way to create a great looking interface on top of data-based applications.
This is deployed to Kubernetes and communicates with the other services, such as the vector database and deployed model, when the user inputs a question.
So, to summarise, we’ve covered the components of MindGPT and you’ve seen the first version of the project, and we’re not stopping there. The project is in active development and we’re working on some interesting things, for example, adding system and model monitoring, continuous deployment, and some more bleeding edge tech (exciting, I know).
If you want to dig deeper into the project, then I’d encourage you to check out the project repository.
As mentioned, there’s a lot happening on this project, so what’s coming next? Our next blog update will discuss model monitoring, demonstrating its usefulness for LLMs, and how we’ve implemented it in MindGPT.