We Finally Ate Our Own Dog Food: What We Learnt Running Our First AI Hackathon

Here's an uncomfortable truth: we've been banging on about the transformative power of AI whilst our own workflows were still pretty manual. How can you sell the benefits of AI (and the need for proper MLOps) when you're not actually benefiting from it yourself?

If you're reading this and thinking "that feels familiar," you're not alone. Most companies—even brilliant AI and open source MLOps consultancy companies—are still running on workflows that would be familiar to someone from 5-10 years ago. We send the same emails repeatedly, jump between seventeen different tools to answer simple questions, and write proposals from scratch every single time. cough.

But as we know, the way we build software is about to change rapidly: agentic coding tools like Claude Code are helping developers “vibe” and are rapidly accelerating workflows, automating repetitive tasks, and tackling complex programming projects in ways we're still discovering— dissolving the boundary between technical and non-technical work, turning anyone who can describe a problem into someone who can build a solution.

And it was during this time of thinking and vibing, Matt and Tom stumbled across Zapier's playbook about rolling out AI across their business, and one bit really resonated: how they ran regular AI hackathons to get everyone trying new tools and seeing how AI can optimise operations… and it sparked an idea.

It was time to eat our own dog food.

Our Day of Reckoning

Friday marked our day of reckoning — the first (of many) AI hackathon(s) here at Fuzzy Labs HQ. Eight pairs of Fuzzicians armed with Claude Code, caffeine, and croissants* (*pain au chocolat and fruit pots, but it kind of ruins the alliterative power and rhythm here) to see what we could actually build when we turned our AI expertise inward.

9:00 – 9:30: Kick-off Briefing & Breakfast

Sam kicked things off with the essentials: keep data safe, test data only, and—in a moment of pure inspiration—announced that every team needed a brilliantly funny name. It was also the same moment that Sam realised that he was the only one without one...

Danny with the pastries and fruit pot fuel

9:30 – 9:45: Project Pitches

Each team had exactly 2 minutes to pitch what they were going to build. What followed was eight teams tackling nine projects* (*Chris and Alan took on an extra challenge to really flex) to hack our way into a more efficient version of ourselves.

The Magnificent Eight (And Their Questionable Team Names)

Team Prospect Hold’em

Team members: Max & Yaz. Tools: they kept their cards close to their chest but we later found out they used Model Context Protocol (MCP) servers for Apollo (built within the hackathon) and Capsule

The pitch: The age-old problem of outreach tools living in complete isolation.

We're using Apollo for outreach and Capsule as our CRM, but the two don't talk - so contact data and activity logs (calls, emails, LinkedIn messages) need to be entered manually. That means duplicated effort, patchy records, and lost insight into what's working.

The plan: To automate how we pull prospects from Apollo.io and add them into our CRM (Capsule) - ready for us to reach out to.

The AI: To use our existing MCP server for Capsule, and vibe code one for Apollo. From there, AI can help tailor outreach based on engagement, suggest next steps, and eventually build a feedback loop that refines messaging with every interaction.

The UI was also briefed by Max to look like Texas Hold'em or solitaire... which prompted the question of what Max actually gets up to at work.

Fuzzy Cauldron

Team members: Rhiannon & Tom. Tools: Claude Code, N8N

The pitch: The cauldron is our community - rather than the more traditional view of the sales funnel - it’s a steady simmer of ongoing content and conversation. It's how we stay close to the people who matter, keeping them warm until the time is right to connect.

But, right now, we don't actually know who's in the cauldron.

The plan: To pull all out our community based metrics (CRM, LinkedIn, event attendees and Substack subscribers) and get them all in one place.

The AI: To vibe code the data extraction, a “cauldron” dashboard and segmentation to give us one clear view of our community and how engaged they are, so we can keep the cauldron bubbling with purpose.

SPIDER (Strategic Predictive Intelligence Data for Entity and Relationships)

Team members: Max & Matt. Tools: Ollama, foundation models, Graph RAG, local LLM

The pitch: Information gathering is bottlenecked by manual analysis— It takes more time to collect facts than to discover strategic insights.

The plan: To create a tool in which you can give it an article and it will extract entity relationships about people and organisations in a Graph structure.

The AI: To use a local LLM to automatically extract information feeds to model connections, using entity-relationship triplets (such as [Company] -> [Signs contract with] -> [Company]).

Then have the tool build visual graphs with local information and community summaries giving a quick and digestible overview, quickly.

Pitch Perfect

Team members: Sam & Shubham. Tools: Cursor, N8N, ChatGPT

The pitch: Customer proposals are very detailed and take a lot of time to write. We have a library of great previous proposals that we can use to help write future ones, by populating some of things we already know, to then spend more quality time on the proposal specific requirements we don’t (yet) know.

The plan: To use our existing private sector proposals as training context, add in our notes and technical exploration findings, then let AI generate a first draft ready to present to a human for review.

The AI: To use N8N to low code a solution that cuts proposal writing time dramatically whilst maintaining quality and consistency across all submissions.

BobCAT (Robbies Commercial Agent Tool)

Team members: Danny & Robbie. Tools: Claude Code MCP

The pitch: “Can we get meaningful insights out of the MCP servers that we currently have?”

We think Claude (and LLMs in general) can do more for us as a business, answering specific questions such as “What was our capacity in February?” to generate really meaningful insights.

The plan and the AI: We already have 3 data sources connected to Claude here at Fuzzy Labs (Toggl, Breathe and CapsuleCRM), via MCP servers, but are they working? The plan for BobCAT was to use/test our MCP servers ability to get Claude to correctly answer questions about the business - and maximise the level of detail in the answers it delivers.

AI-Char “HR” (A Better Hiring Automation)

Team members: James & Sav. Tools: Copilot, Cursor, Ollama, 9000 APIs (approximately)

The pitch: The admin around hiring, such as chasing CVs or synchronising calendars for interview slots eats up massive chunks of time. Cue an opportunity for some proper intelligent automation.

The plan: To create an AI powered hiring bot to triage job applicants and automate the journey whilst ensuring humans are at the forefront of our actual decision making. AI-Char will process applications, populate Jira tickets, wrangle calendars to book interviews, send acknowledgement emails and nag us about unreviewed candidates, all within one purpose built place.

TROLL (Ticket Resolution & Organisational Lifecycle Liaison)

Team members: Alan & Chris. Tools: Cursor, Claude Code, Gemini CLI, Ollama and maybe Junie

The pitch: Writing clear and detailed Jira ticket descriptions are often seen as time-consuming and a little bit tedious. Team members would greatly benefit from a tool which helps populate the tickets with rich and detailed descriptions, freeing up team time for working on the ticket itself and all the good stuff inbetween.

The plan: TROLL rewrites Jira descriptions for clarity and conciseness, flags incomplete tickets, and asks clarifying questions to prevent vague tasks from entering the backlog. Meaning less time on paperwork, more time building features.

Team TROLL second pitch, the docstringator!

The Docstring Whisperer acts as a pre-commit hook that reviews code for missing documentation and adds consistent, clear docstrings automatically.

The TIMelord (Tims Interactive Metric Extractor Leveraging Orgs Raw Data)

Team members: Misha & Tim. Tools: MCP, Cursor, Bespoken Agentic framework, Notion, Toggl

The pitch: Learning from past projects requires great and acrobatic manual feats, with data scattered across Jira, GitHub, Toggl, and people's heads. Without a quick, accurate view of what actually happened, we run the risk of making estimation errors or falling into scope creep traps.

The plan: To build Fuzzy Archaeology, a terminal AI tool that digs through individual project data to identify what actually happened versus what we estimated, creating structured insights for future planning.

The AI: Using the Bespoken Agentic framework, create a natural language interface connecting different data sources to generate business KPIs more efficiently.

9:45 - 12:15: Sprint 1 - The Deep Dive

This is where the magic happened. Teams disappeared into their corners at Fuzzy Labs HQ with laptops, determination, and varying degrees of caffeination. The energy was palpable.

What became clear within the first hour was that the teams weren't just diving straight into code (for the most part). They were using their AI tools as thinking partners, exploring possibilities and rapidly prototyping ideas. The teams that treated Claude Code and ChatGPT as a collaborative thought partner rather than just a code generator were rapidly refining their ideas and filling in the thinking gaps with ease.

It was deeply collaborative, with Rhiannon and Max ditching their usual non technical stances and getting to grips with Terminals, Repos and Claude Code - learning completely new skills sets alongside having their minds blown by vibe coding.

Robbie also used this time to do some cricketcal thinking.

Robbie was completely bowled over by Danny's ideas in team BobCAT

Max's first venture into vibe coding, vibes off the scale!

12:15 – 12:45: Lunch Break - Rudy's Pizza

Because you can't run a proper hackathon without pizza. Rudy’s delivered on the pizza and the teams were quickly sharing slices and discoveries, troubleshooting each other's problems, and getting cocky about what they were going to achieve in the second sprint.

13:45 – 16:30: Sprint 2 - The Home Stretch (When Everything Broke)

The second sprint was about turning prototypes into demos moving from exploring possibilities to finalising working solutions. Well, the majority of our Fuzzy teams did… Rhiannon and Tom ran out of code credits and nearly broke Claude AI getting their UI working, frantically trying to persuade the Figma MCP and Claude Code to talk to each other properly as the demo deadline loomed.

All projects came to life, with lots of repos published and working tools that we plan to open source—including TROLL from team TROLL and the Apollo MCP from Team Prospect Holdem.

We were also left questioning if Max should be Business Relationship “Developer” after publishing the Apollo MCP on his brand new GitHub account. His first commit to the world of open source? A beautifully working integration, which may or may not have been built by Yaz.

*16:30 - 17:30: Drinks & Demo Hour - The Grand Finale**

Fuelled by beers and mini poppadoms, each team was meant to have 10 minutes to present their solution including Q&A. *We definitely went over the allotted time.

Of course, this being a proper demo session, some of the solutions also decided not to play ball. Things that had been working perfectly all afternoon suddenly refused to load or threw mysterious errors—blame the meeting room WiFi. But that's exactly what we wanted to see: where the tools struggled, what the real learnings were, and how the teams adapted.

But, the main take away, every single team had built something genuinely useful. Actual solutions that would improve how we work. Some were ready to deploy immediately and open source to our community (more blogs and announcements to follow), others need a bit more development/combining together, but all have a bright future here at Fuzzy Labs.

Team BobCAT putting in the finishing touches

What We Learnt

We went into the hackathon with a desire to build some fun stuff, work together and see what these tools could actually do—would they fail us, break under pressure, or actually deliver? The goal was to try things out, not just build perfect solutions. Here's what we found:

Tool Performance:

We love Claude Code - it's brilliant at building and prototyping, especially for agentic workflows
ChatGPT is excellent for language and helping flesh out prompts, but Claude Code wins for actual development
A few teams tried Gemini CLI, reverting to Google's coding agent when they ran out of Claude code credits but to mixed reviews - there was also a desire to use Junie from JetBrains, but it wasn’t eventually used on the day - we definitely need another hackathon or tool comparison to explore this one properly!
The Figma MCP still needs lots of prompts to work smoothly. The idea is amazing in that it breaks the traditional workflow of creating a design in Figma then having a frontend dev translate the design for web, and instead have Figma and Claude Code directly hooked up together to code whilst you create - but having tried it out, we didn't feel it was quite there yet.

What Worked (And What Didn't):

Although it wasn’t a competition (on this occasion), it was agreed that Team TROLL won the day with sheer practical usefulness
Max (and Yaz’s) Apollo MCP proved MCP servers can be built and deployed in a day
Rhiannon's cauldron animation showed impressive purely vibed UI capabilities
Max and Matt's SPIDER graphs demonstrated complex data visualisation is quickly achievable
There was a lot of over-ambition when it came to scraping tools
Matt learnt if you threaten to fire Claude Code, it actually works pretty quickly and efficently - this approach was also taken on my Sav and James, who used it to the same effect.

Approach Insights:

Treating AI tools as thinking partners rather than simple code generators led to better results. One of the best examples of this in Claude Code was using the prompts "think" < "think hard" < "think harder" < "ultrathink" to access different modes of thinking and analysis - to return (what we felt were) better and more specific results.
Rapid prototyping worked better than trying to build perfect solutions
Sharing discoveries across teams created unexpected opportunities, such as how we could even hook the ideas up together, creating even richer results (such as linking SPIDER with Prospect Hold’em or The Cauldron with BobCAT… watch this space)
The technical capabilities are there - the challenge is getting people to experiment fearlessly

Post-Hackathon Impact:

Sam built a complete ATS (Applicant Tracking System) in 2 hours the following Sunday, inspired by the AI-Char prototype - which was then proudly demoed in the Monday morning meet.

Open source links already available:

What's Next?

This was just the beginning. We'll be sharing detailed breakdowns of each project, the tools that worked (and those that didn't), and the lessons learnt along the way. Some solutions are already being integrated into our daily workflows, whilst others have sparked bigger conversations about how we want to work. We'll be open sourcing some of them too, because knowledge shared is knowledge multiplied.

The real victory? We're no longer an AI company that doesn't use much AI. We're an AI company that's genuinely excited about using AI to work better, faster, and with fewer repetitive tasks eating into our time for actual problem-solving.

The reality is that AI tools work best when you focus on the human workflows they can augment, not replace. The most successful teams treat these tools as thought partners rather than magic code generators. They explore possibilities, prototype rapidly, and share discoveries across technical and non-technical users.

And here's the beautiful part: you don't even need to be an AI company to do this. You just need problems worth solving and people willing to spend a day seeing what's possible.

Our challenge to you? Don't just read about it, do it. Pick a day, gather your team, identify your most annoying workflow problems, and see what you can build. You don't need perfect planning or the latest tools—you just need curiosity and a willingness to experiment.

Plus, we've now got enough material for at least three more blog posts, a comprehensive AI tools comparison, and "The Fuzzy Guide to Running a Hackathon… so brace yourselves, plenty more to come!

Want to know more about specific projects or thinking about running your own internal hackathon? Get in touch - we're always up for sharing what we've learned. And if you do run your own hackathon, tell us about it. We'd love to hear what you discover.

‍