
AI agents are having a moment – but what exactly are they, and are they actually useful? We wanted to find out the way we usually do: build something and see what happens.
In November 2024, Anthropic released the Model Context Protocol (MCP), an open standard designed for building agentic systems. That gave us the structure we needed to try building an agent that could observe, make decisions, take actions, and ideally get better over time.
We needed a clear, self-contained use case, so we picked chess.
Why Chess?
We weren’t trying to build a world-class player. Chess just gave us a neat sandbox to work in: clear rules, visible outcomes, and no messy edge cases to deal with. It let us focus on the actual architecture of the agent, without getting bogged down in unpredictability.
We ended up building the Fuzzy Labs Chess Agent. It used MCP to manage the control loop, and Claude as the language model. The big question: could the model actually play chess – not just pick moves from a list, but reason about the board and decide what to do?
How It Worked
Claude already knows a fair bit about chess thanks to its training data, but it can’t take actions on its own. So we built a set of eight tools and exposed them via an MCP server. These tools could:
- Start a new game (
create_game
) - Log in to Lichess (
log_in
) - Check the board state (
get_board_status
) - Make moves (
make_move
) - ...and so on.
With those in place, Claude was able to log into Lichess, kick off a game, watch it play out, and make its own moves in response.
It wasn’t all smooth sailing – we ran into a few bugs, including formatting hiccups and moments where the model forgot which side it was playing. But it learned and improved, and eventually managed to complete an entire game on its own.
What We Learned
Built-in chess smarts – Claude’s pre-training gave it a decent baseline understanding of the game. Handy for getting started.
MCP keeps things organised – Having a clear loop structure made the agent behave in a more consistent, predictable way than you’d get from a one-off chatbot prompt.
Tools = grounded reasoning – Giving the agent real-time access to the board meant its decisions were based on facts, not guesses.
Token costs add up fast – Each loop involved a lot of context, which burned through tokens quicker than expected. Something to watch if you’re planning production use.
Tools raise new questions – Giving the agent access to things like logins and gameplay means thinking seriously about trust, control, and safety.
Third-party servers can be a security risk - community-built MCP servers are becoming commonplace which presents a threat vector for malicious code injection. If using a third-party MCP server it’s important to check the validity of the repository and potentially do a code review yourself to check for any malicious code or unintended effects.
What’s Next?
This was a quick experiment – we built the whole thing in a day – but it surfaced a lot of good questions for future work:
- How do we measure how “good” an agent is?
- Can we make it faster and cheaper to run?
- What does safe, scalable access control look like?
We’ll be digging into those next, applying what we’ve learned to more complex and realistic problems.
This chess project was just a starting point – a way to test ideas and get a feel for how these systems behave in the wild. More to come.