Architecture
The architecture, end to end.
CallFunnel is the control plane above the model, the channel, and the CRM. Here is the whole system: four planes, a single call traced from dial tone to decision, and a scaling model bounded by provider capacity rather than our own hardware.
The system
Four planes
Responsibilities are split so each layer scales and fails independently. The media layer is stateless and disposable; durable memory lives below it.
Holds the live call: bidirectional audio, voice-activity detection, turn-taking and barge-in. Stateless — it keeps call state only for the life of the call.
The brain between calls: workflows, the outbound dialer, the per-customer timeline, approval-in-the-loop, rule-book authoring and usage metering. This is where the agent decides and acts.
Tenant lifecycle, authentication, per-tenant secret provisioning and billing. Every tenant is isolated here before a single byte of their data is touched.
The durable layer: conversation and timeline state, the retrieval knowledge base, queues and sessions, and call recordings. State lives here so the media layer above can stay disposable.
The latency path
Anatomy of a call
One conversational turn, from the caller's voice to the agent's reply. Everything on this path is tuned for sub-second response; everything that isn't latency-critical happens off it.
Mid-turn, the model can call tools that reach the conductor — a record lookup, a concession that needs Slack approval, an escalation. The timeline write and the call recording are persisted asynchronously, off the latency path, so they never slow the conversation.
The economics of scale
How it scales
Because the heavy machine learning is bought as streaming APIs, the platform's own per-call work is I/O-bound — hold two sockets, run light VAD, relay frames. No GPUs, no model hosting. That makes scale a matter of adding identical, inexpensive nodes.
- Stateless and horizontal. Voice workers hold call state only for the call's lifetime; all durable state is externalized — so you scale out behind a sticky-WebSocket load balancer.
- Warm-buffer autoscaling. KEDA scales on the active-call signal, not CPU, and keeps a warm buffer — because a live call can't absorb a cold start during a surge.
- A deterministic ceiling. The binding limit is provider concurrency — telephony channels and model rate tiers. Raise a tier, add nodes, and capacity grows linearly rather than hitting an architectural wall.
Trust boundary
Isolation & security
- Identity at every hop. RS256 JWTs issued by Keycloak are validated at each service; per-tenant groups isolate identity and authorization.
- Secrets sealed. Per-tenant credentials live in Vault — provider keys and tokens never sit in plaintext env or config.
- PII redacted before the model. Sensitive fields are stripped before anything reaches the LLM or the knowledge base.
- Tenant-scoped data. Recordings, timeline and knowledge are partitioned per tenant, with export-and-erase for data-subject requests.
Want the full component list and the reason behind each pick? See the Tech Stack. Or start free and drive a real call yourself.