Mental Health problems are something that everyone struggles with at various points in their life and finding the right type of help and information can be hard, even a blocker. Mind - the main mental health charity in the UK - summarises the importance of information access best: when you’re living with a mental health problem, or supporting someone who is, having access to the right information is vital.
If someone is seeking information for themselves or on behalf of someone else, then filtering through the information available can be challenging and time consuming. Both the Mind and the NHS Mental Health websites do a great job of covering a broad range of topics under the umbrella of mental health, and point to various charitable, professional, and clinical organisations which can provide support.
But what if we could streamline and increase the ease of access to the great information these organisations provide? Here's an introductory video from our CEO, Tom, to tell you a little more about the project:
MindGPT
We’re setting out to build a conversational system where people can ask mental health-oriented questions and receive answers which summarises content from the two leading mental health websites mentioned previously: Mind and NHS Mental Health.
The system won’t act as a digital counsellor, but rather as a gateway to vital information sources, summarising two authoritative perspectives and providing pointers to the original content.
In building this, we’re drawing on our expertise in MLOps and prior experience in fine-tuning open-source LLMs for various tasks (see here for an example of one tuned to summarise legal text).
What’s coming up
There’s a lot to talk about in this and we have plenty of interesting content planned for you to read about:
1. DataOps & MindGPT: no data means no ML, so how do you efficiently scrape websites in a well engineered and automated way.
2. LLMs & Vector Databases: we’ll be generating embeddings and having the ability to store and query them efficiently is important, so here we’ll take a deeper look at vector databases and their usefulness when working with LLMs
3. MindGPT - a tour of the first version: we’re aiming to have a thin slice up and running pretty quickly and here we’ll take you on a tour of how we’ve done that.
4. I say, you do: data is crucial in this project and in this post you’ll see the steps we’ve taken to prepare data ready for an LLM.
5. LLMs & Data Version Control: versioning data is a must but how do we solve that technical challenge in this project?
6. Monitoring LLMs in the wild: this post will take a deep dive into model monitoring for LLMs.
7. Prompt Engineering: we need to tell the LLM what to do, and getting those instructions correct can be tricky, so in this post we delve into the dark art of engineering these instructions to maximise the models capabilities.
8. Evaluating Large Language Models: how can you tell if the output of your LLM is accurate or even makes sense? In this post, we demonstrate how you can evaluate your LLMs to ensure it’s generating reliable responses.
9. Guardrails for Large Language Models: MindGPT should be focused on providing reliable and informative answers to user questions, but that doesn’t always happen. In this post, we show how you can add guardrails to prevent off-topic questions being answered.
We hope you like the content, though for those of you that prefer to see how we're progressing; here's a link to the GitHub repository.
Motivations
Our motivations for building this system break down into two parts: community and technical.
AI for good
We want to build something that is useful for the public: an intuitive way for easily accessing information. But why would the public want this? It’ll enable them to find information about the problem(s) they, or someone they care for, are facing. By summarising information from two different sources will bring together two slightly different (or even the same) perspectives on topics - diversity of information.
More broadly, anyone who is wanting to find out more about a topic will be able to use the system, asking questions such as “what are the symptoms for depression?”
Showcasing open source
Here at Fuzzy Labs, we love technical challenges and we have a history of excelling at them (see our winning solution to the ZenML month of MLOps challenge).
This project is one of the largest and most ambitious projects we’ve tackled and it poses a range of technical challenges, from collecting the right data and versioning it, to fine-tuning multiple models, and interfacing with vector databases.
In building MindGPT we’ll be making use of some of our favourite open source tools such as ZenML and Seldon, along with getting familiar with new open source tools.
If you can’t tell, we’re keen on open source and using open source tools is incredibly important in this project. Privacy and ethics are a focus here, we want to ensure that sensitive information that users may input into MindGPT is handled in a way that is completely transparent to the user. Open source enables us to do this - users can see how their data is used and how it moves through the various pipelines; we’re not putting user data into a black box.
AI safety is a popular topic at the moment (rightly so), a lot of which is ignited by the black box nature of existing solutions. Using open source tools and developing this project in the open means that we can address some of these safety concerns.
There’s a lot to be said here and we’ll talk in more depth about this in future content, which you can read about in the next section.
We've outlined the project diagrammatically below. We've also colour coded the diagram to illustrate how we're progressing through the project; green parts are complete, while the orange parts represent what we're currently working on. As you can see, we're just working through the data collection piece now!
To learn a little more about the technical detail of the project, our CTO, Matt, explains exactly how we're going to build MindGPT.
Here at Fuzzy Labs, we love technical challenges and we have a history of excelling at them (see our winning solution to the ZenML month of MLOps challenge).
We’re excited about this project and can't wait to share more. Next up we'll be covering DataOps and the art of web scraping. We hope you’re as pumped as us, so stay tuned for updates!
You can read our next blog in the series, MindGPT: DataOps for Large Language Models, here.