Voice-AI agent for enterprise telephony

A multi-tenant voice AI platform deployed for a European software company serving 20,000+ trade businesses - handling inbound calls end-to-end with GDPR-compliant infrastructure.

Year: 2026 - present
Stack: TypeScriptFastifyWebSocketsTelnyxAzureVuePostgreSQLpgvectorRAGPrisma

The problem

The client is a European software company serving 20,000+ trade businesses internationally - painters, plumbers, electricians, decorators. The kind of operators whose owners are usually the ones doing the work.

That's the constraint: every call they answer is time they're not on a job. Every call they miss is a quote that goes to a competitor. Generic call centres and IVR menus haven't solved it - trade callers ask specific questions ("can you fit a new boiler before Friday?"), expect a real conversation, and won't sit through phone-tree menus.

The client wanted an AI agent that could answer these calls 24/7, hold a natural conversation, capture quote and callback requests, and stay strictly on-topic - across thousands of independent trade businesses, each with their own services, pricing, and knowledge base.

The brief was multi-tenant by definition. The execution had to be GDPR-clean, enterprise-grade, and operable by non-technical end users.

The approach

We specced the full architecture and proposed a four-stage build, with each stage compounding on the last - database, models, and abstractions defined in Stage 1 so Stages 2-4 wouldn't require re-architecture. The proposal was accepted within 24 hours of the pitch call.

GDPR as the architectural driver. Every external dependency was evaluated against data-residency and minimal-third-party principles. Telnyx for telephony (developer-friendly, EU-region routing). Azure OpenAI for STT, LLM, and TTS - using Data Zone Standard to keep all model inference inside the EU, even as Azure shifts load across regions. Mailgun's EU region for transactional email of call summaries. No data leaves the EU at any layer of the stack.

Real-time voice pipeline. A Fastify backend on WebSockets handles the live audio stream - incoming caller audio is transcribed, sent to the LLM with the relevant tenant context, and the response streamed back through TTS, all under conversational latency budgets. Voice activity detection (VAD), barge-in handling, dual-audio prevention, and "is the caller pausing to think or done talking?" boundary detection were all calibrated empirically - the kind of work that decides whether the agent feels conversational or robotic.

Tenant-scoped RAG over a knowledge base. Each tenant uploads their own knowledge - PDFs, URLs, service descriptions. The content is chunked and embedded into a pgvector-backed index. At call time, the caller's question is matched against the tenant's chunks via cosine similarity; only the most relevant context is sent to the LLM. This keeps prompts small (cost), responses on-topic (quality), and inputs strictly tenant-scoped (no cross-tenant leakage). The similarity threshold was tuned for the cost/quality boundary specifically.

A full multi-tenant SaaS to operate it. Two dashboards - a super-admin view for the client's brand/tenant management, and a tenant-facing view for the trade businesses themselves. 4-role RBAC (super admin, brand admin, tenant admin, tenant user). Per-tenant configuration of the AI greeting, GDPR disclosure, and CTA text - the three things the agent says immediately after answering. Working hours configurable per tenant and fed to the AI as context. Pricing model with included minutes plus per-minute overage, calculated and visible per tenant. Call logs with full transcripts, AI-generated summaries, classifications (quote request, callback, general question), and configurable email notifications.

Built for what comes next. The architecture is adapter-style throughout - swap one LLM for another, one STT/TTS provider for another, plug in any number of ERPs. The system is already wired for ERP integration in a later stage, enabling status-update calls ("when is my work happening?") and automated work-ticket creation. Interactive setup scripts automate Azure provisioning end-to-end.

The outcome

Stage 1 shipped to production with the AI agent handling real inbound calls in two languages at launch. By the first demo, the client had 20 beta tenants lined up. After Stage 1 closed, the client immediately accelerated Stage 2 - months ahead of the original roadmap.

The agent answers calls 24/7, stays strictly on-topic to each tenant's business, doesn't hallucinate beyond the tenant's knowledge base, captures quote and callback requests, and routes call summaries to the right people.

Next stages: expansion to additional countries, ERP integration for live job-status calls, and expanded multilingual coverage.

Next project

Sentinel - trading strategy platform

Read case study