Applied AI & LLM engineering
Retrieval-augmented search, multi-step agents, generative tooling, structured output, evals and guardrails — wired straight into your Django models with full observability and cost telemetry.
Webbyfox is a small, senior UK studio that ships ambitious software — headless Wagtail CMS platforms, Django backends, retrieval pipelines and AI agents — for product teams, publishers, universities and the public sector. We pair deep Django engineering with applied LLM work, so your AI features feel inevitable, not bolted on.
We move in small, senior squads — usually one lead engineer plus a specialist. Everything we ship is production-ready Python: typed, tested, observable, and easy for your team to take over.
Retrieval-augmented search, multi-step agents, generative tooling, structured output, evals and guardrails — wired straight into your Django models with full observability and cost telemetry.
Greenfield Django backends and rescue work for existing monoliths. Domain modelling, DRF / async APIs, background workers, multi-tenant patterns, and migration strategies that don't break production.
Editor-first publishing platforms in Wagtail — StreamFields, custom blocks, headless GraphQL/REST, multi-site, and AI-assisted editorial tooling.
The boring foundations that make AI features dependable — infra, CI/CD, evals in CI, prompt versioning, cost & quality dashboards, on-call runbooks.
Two-week sprints to de-risk an idea: architecture, build-vs-buy, model selection, eval plan, cost envelope, and a roadmap your team can actually execute against.
We treat LLMs as just another dependency: typed inputs and outputs, deterministic tests, evals on every PR, and a clear story for cost, latency and failure. The result is AI features your team can own, debug and improve.
pgvector or Pinecone, hybrid BM25 + dense, query rewriting, citation-grounded answers. Wired into your existing Wagtail or Django models — no parallel content store.
LangGraph-style state machines, tool calling against your real services, human-in-the-loop, structured streaming. Memory, recovery and observability baked in from day one.
Offline eval suites, regression tests in CI, prompt versioning, PII redaction and red-team harnesses. So you can ship LLM changes with the same confidence as any other deploy.
Wagtail dashboards for content teams — drafts, tone-matching, translation, alt-text, SEO suggestions and asset tagging. Editors stay in control; the AI does the grunt work.
Llama, Mistral, Qwen and friends on your own GPUs or VPC. We handle vLLM, batching, quantisation and the security paperwork. Suitable for regulated and public-sector clients.
Indicative engagement patterns rather than client case studies — specifics under NDA. Happy to share real examples (with permission) on a discovery call.
Hybrid retrieval across articles, images and video. Cited answers in the publication's voice, editor dashboards for tone & bias review, and a fail-soft fallback to classical search when the LLM is uncertain.
Replace the BDR first-touch with a Django-hosted agent that reads CRM, books meetings and writes recap notes. Human-in-the-loop for anything above a confidence threshold.
One Wagtail install backing multiple public sites in multiple languages, with AI-assisted translation review and accessibility checks built into the editorial flow. WCAG 2.2 AA from day one.
Internal tool that turns every prompt change into a tracked release — offline evals, shadow traffic, drift detection and per-prompt cost. Confidence to ship LLM changes the same way you'd ship code.
Untangle migrations, kill circular imports, move to async-first DRF, add typed boundaries between domains. No big-bang rewrite — just a year of small, safe steps.
Most engagements run on a fixed-price discovery, then a rolling 4-week build cadence. You always own the code, the prompts and the evals.