Large Language Models (LLMs) are becoming increasingly ubiquitous, deployed across a range of domains for various tasks. As they’re trained on vast amounts of data scraped from the internet and because of their inherent randomness, their behaviour can be unpredictable. Sometimes they can go off topic, share harmful information, or produce noise (here's an example of just that). Preventing that from happening at the model level is incredibly difficult.
It’s not just the model that can behave unexpectedly. Users can also ask questions which are off topic or designed to attack the underlying model. I’m sure we’ve all tried to get ChatGPT to reveal its base prompt (the initial instructions on how it should construct a response) before!
This is where Guardrails come in.
These are configurable boundaries that an LLM is allowed to operate in, ensuring that our LLM behaves as you expect it to. Essentially, you can think of guardrails as a method to ensure predictable and reliable behaviour.