EST. 2012 · England, UK

FOCUS Python · Django · Wagtail

PRACTICE CMS · API · AI · RAG · Agents

STATUS v.2026 — accepting briefs

We build AI-native products
in Python, Django & Wagtail

Webbyfox is a small, senior UK studio that ships ambitious software — headless Wagtail CMS platforms, Django backends, retrieval pipelines and AI agents — for product teams, publishers, universities and the public sector. We pair deep Django engineering with applied LLM work, so your AI features feel inevitable, not bolted on.

View selected work Book a discovery call →

~/webbyfox/agents/sales_qualifier.py python 3.12 · django 5.0 · wagtail 6.2

01# compose a Wagtail-backed RAG agent in a few lines

02from webbyfox import Agent, Retriever, Tool

03from myshop.models import Product, Article

05retriever = Retriever.over([Product, Article], embed="text-embedding-3-large")

06agent = Agent("sales-qualifier", retriever=retriever, model="claude-haiku-4-5")

08@agent.tool

09def book_meeting(when: str, with_: str) -> Tool.Result:

10 return calendar.book(when, with_)

12# > deployed to k8s · streaming responses · 142 ms median latency

13✓ live in production for "northstar.io" since 14 days ago

// Tooling we live in

Python 3.12 Django 5 Wagtail 6 DRF FastAPI Celery Postgres + pgvector Redis LangGraph OpenAI · Anthropic · Cohere HuggingFace Kubernetes Terraform AWS · GCP Sentry Playwright

01 / What we do

Four practices, one team. Built for teams who want to ship opinionated software.

We move in small, senior squads — usually one lead engineer plus a specialist. Everything we ship is production-ready Python: typed, tested, observable, and easy for your team to take over.

01 — flagship practice

Applied AI & LLM engineering

Retrieval-augmented search, multi-step agents, generative tooling, structured output, evals and guardrails — wired straight into your Django models with full observability and cost telemetry.

RAGAgentsEvalsFine-tuningpgvectorLangGraph

Django product engineering

Greenfield Django backends and rescue work for existing monoliths. Domain modelling, DRF / async APIs, background workers, multi-tenant patterns, and migration strategies that don't break production.

Django 5DRFCeleryPostgresAuthMulti-tenancy

Wagtail CMS & headless content

Editor-first publishing platforms in Wagtail — StreamFields, custom blocks, headless GraphQL/REST, multi-site, and AI-assisted editorial tooling.

Wagtail 6StreamFieldHeadlessi18n

Platform & MLOps

The boring foundations that make AI features dependable — infra, CI/CD, evals in CI, prompt versioning, cost & quality dashboards, on-call runbooks.

K8sTerraformAWSSentryEvals

Discovery & technical strategy

Two-week sprints to de-risk an idea: architecture, build-vs-buy, model selection, eval plan, cost envelope, and a roadmap your team can actually execute against.

AuditRoadmapEval planBuild · buy

See all services →

02 / AI Practice

An AI layer that lives inside your domain model — not a bolt-on widget.

We treat LLMs as just another dependency: typed inputs and outputs, deterministic tests, evals on every PR, and a clear story for cost, latency and failure. The result is AI features your team can own, debug and improve.

A.01

Retrieval-augmented search & chat

pgvector or Pinecone, hybrid BM25 + dense, query rewriting, citation-grounded answers. Wired into your existing Wagtail or Django models — no parallel content store.

+ citations

A.02

Agents & multi-step workflows

LangGraph-style state machines, tool calling against your real services, human-in-the-loop, structured streaming. Memory, recovery and observability baked in from day one.

streaming

A.03

Evals, guardrails & safety

Offline eval suites, regression tests in CI, prompt versioning, PII redaction and red-team harnesses. So you can ship LLM changes with the same confidence as any other deploy.

CI-gated

A.04

AI-assisted editorial tools

Wagtail dashboards for content teams — drafts, tone-matching, translation, alt-text, SEO suggestions and asset tagging. Editors stay in control; the AI does the grunt work.

editor-first

A.05

Private / on-prem deployments

Llama, Mistral, Qwen and friends on your own GPUs or VPC. We handle vLLM, batching, quantisation and the security paperwork. Suitable for regulated and public-sector clients.

vpc · on-prem

// agent.trace · reference SAMPLE

user.query → intent classifier38 ms

↓

retriever.hybrid_search · k=8112 ms

↓

llm.compose · streaming312 ms

↓

tool: book_meeting()— ms

↓

response.commit + citations— ms

p50 latency

142ms

eval pass rate

97.4%

cost / 1k calls

£0.32

uptime · 90d

99.98%

03 / Shape of engagements

The kinds of brief we take on.

Indicative engagement patterns rather than client case studies — specifics under NDA. Happy to share real examples (with permission) on a discovery call.

RAG · editorial

PatternHeadless WagtailHybrid RAGPublisher · editorial team

Editorial AI search for a publisher.

Hybrid retrieval across articles, images and video. Cited answers in the publication's voice, editor dashboards for tone & bias review, and a fail-soft fallback to classical search when the LLM is uncertain.

Agents · b2b Analytics dashboard on a laptop screen

agent traces · streaming

PatternDjango + LangGraphB2B SaaS

Sales-qualifier agent inside a SaaS.

Replace the BDR first-touch with a Django-hosted agent that reads CRM, books meetings and writes recap notes. Human-in-the-loop for anything above a confidence threshold.

Wagtail · multilingual Row of international flags

multi-site cms · workflow

PatternWagtail HeadlessMulti-site · multi-lang

Multilingual platform with editorial workflow.

One Wagtail install backing multiple public sites in multiple languages, with AI-assisted translation review and accessibility checks built into the editorial flow. WCAG 2.2 AA from day one.

Platform · MLOps Scientist working in a lab

evals · drift · cost

PatternK8s · TerraformRegulated industry

Eval & observability platform for LLM features.

Internal tool that turns every prompt change into a tracked release — offline evals, shadow traffic, drift detection and per-prompt cost. Confidence to ship LLM changes the same way you'd ship code.

Django · rescue Source code on a dark screen

monolith → modular monolith

PatternDjango 3 → 5Legacy codebase

Rescue & modernisation of a long-lived Django app.

Untangle migrations, kill circular imports, move to async-first DRF, add typed boundaries between domains. No big-bang rewrite — just a year of small, safe steps.

04 / How we work

A small, senior squad. No ceremony tax.

Most engagements run on a fixed-price discovery, then a rolling 4-week build cadence. You always own the code, the prompts and the evals.

STEP 01WEEK 0 – 1

Discovery

Stakeholder & user interviews
Architecture & data audit
Model + tooling selection
Eval plan & cost envelope

STEP 02WEEK 2 – 4

Prototype

End-to-end vertical slice
Real data, real users, no mocks
Eval suite v1 in CI
Demo & go / no-go

STEP 03WEEK 5 – 12

Build & harden

Production Django / Wagtail
Observability + cost dashboards
Pair with your engineers
Runbooks & on-call

STEP 04ONGOING

Handover & care

Knowledge transfer sessions
Eval & drift monitoring
Quarterly model refresh
Retainer or hand back the keys

Got a tricky Python
or AI brief?
Let's talk.

Use the contact form →

Calendar

30-min discovery call · UTC / BST

Where

England, UK · remote-first · EU & US hours

Typical engagement

£8k discovery · £15–25k / month build squad

Send a brief ↗ Book a call