Cybersecurity for Large Language Models - Introduction

Large Language Models are a new frontier for cyber security. It’s important not to get so caught up in the hype that we forget about good defence, as early adopters stand out, vulnerable to attack, and preyed upon by increasingly sophisticated adversaries.

In this blog series, we’ll discuss what it means to defend LLMs, the vulnerabilities and tactics used by attackers, and how to defend ourselves. We’ll approach these questions from an MLOPs standpoint too, giving some insight into how infrastructure influences security, and how to think about security in your MLOps tooling.

Are large language models secure?

Picture yourself as a wildebeest in the African Savannah. Threats are everywhere: lions, hyenas, cheetahs, and so on. But you’ve got one thing going for you: you’re highly-adapted to this threat landscape. You can hide among a herd, you can run up to 50 mph, and you can work as a group to survive.

As far as ‘ordinary’ software engineering goes, the situation is similar. Everything we build will come under attack, but we’re familiar with the threats, e.g. SQL injection attacks, and we can deploy standardised defences against them. Software engineers aren’t typically cyber security experts, but there are specialists who can analyse a system thoroughly, discover vulnerabilities and advise on secure design.

With machine learning models, and especially generative AI models such as LLMs, it’s like an entirely new set of threats has entered our Savannah. The equilibrium is broken and, as the industry goes full steam ahead in adopting technologies like Large Language Models, we face a new scale of challenge in understanding how to secure them.

The first challenge is one of scale: the models are massive, the behaviours they exhibit vast, and there are dozens of different models, each trained in different ways that give rise to different characteristics.

Secondly, the range of potential attacks is broad. Let’s consider a few scenarios:

Model stealing: an attacker tries to reverse engineer an LLM in order to figure out what data it was trained with, perhaps to steal proprietary information.
Exploiting tool-use: suppose an LLM is capable of querying a database, or calling an API. The attacker may try to exploit this capability in order to compromise external systems.
Exploiting decisioning: in a similar vein, an LLM that is responsible for making decisions, such as to approve or deny a loan application, becomes a target because of that capability.
Human harm: here, the LLM isn’t being attacked per se, but it’s being used to cause harm to people, for example by generating highly personalised scam emails.

These are each very different in nature. There’s no single security approach that’s going to address all of them. In this series we’ll focus on cases where the LLM is directly under attack, which excludes for instance human harm, but even then, there’s a lot of ground to cover.

The third challenge has to do with the full stack. Ideally, we need to be thinking about how an attacker might use an LLM to attack the software stack that it runs on, blending the known and the new. So we might consider how an attacker could tie up a model server with processing inputs, or even whether they can exploit a vulnerability in the Transformers library or the Python runtime — those are, as far as I know, completely hypothetical ideas right now, but certainly not outside the realms of possibility.

Themes

There’s a lot to talk about in this space, and we’ve organised the up-and-coming content into themes with one or more blogs covering each theme:

Attacks

The world of LLMs moves at a fast pace, with new models and variants of existing models springing into existence almost daily. Attacks and threats to these new models are growing at a similarly rapid pace, and are starting to make mainstream news as models are increasingly integrated into everyday life. In this theme, we’ll go into detail about known attacks against LLMs, what vulnerabilities LLMs have, and the consequences of a successful attack.

Here are the blogs that are a part of this theme:

Mitigation

It’s not all doom and gloom, there are preventative measures and strategies that can be put in place to prevent attacks. A lot of those strategies come down to sensible engineering. In this theme, we’ll discuss how you mitigate against attacks, what tooling and strategies enable you to do that, and the open challenges in mitigation.

Here are the blogs that are a part of this theme:

How safe are your safeguards?

LLMOps: Secure by Design

Pulling both of the previous themes together, we’ll introduce a concept called secure by design. Here, we’ll show you how to design and provision infrastructure such that it’s secure by design, aiding in avoiding the pitfalls that enable the majority of attacks on LLMs and the infrastructure around them.

A History

Given the novelty of LLMs, a lot of the associated attacks are also novel. However, despite their novelty, LLMs are based on established Machine Learning techniques. This is also true for attacks: a lot are modifications of existing attacks made to work on LLMs. We’ll discuss that broader history, covering what the attacks have in common and where they come from

Cybersecurity for LLMs is a topic that we’ve been thinking about a lot recently and we’re excited about this new series of blog posts. We hope you’re as excited as us, so stay tuned for updates!

‍

Cybersecurity for Large Language Models - Introduction

Are large language models secure?

Themes

Attacks

Mitigation

LLMOps: Secure by Design

A History

More like this

MindGPT: An introduction

Purple Teaming your LLM with Purple Llama

Guardrails for Large Language Models

Sign up to our newsletter