Vocale AI Architecture Map

Last updated: April 11, 2026
75%
Core System Built
~45%
Remaining (Multi-Tenant SIP)
5
Total Planes/Layers
12
New Tasks Identified
Current Architecture (What Exists Today)
Built & Working
Partially Done
Not Started / New
1. User Plane (Dashboard)
Next.js 16 + React 19 + Supabase + LiveKit Client | Vercel
Built
Auth (NextAuth)
Voice Agent Page
Knowledge Base (PDF upload)
Customer DB Connect
Field Mapping Editor
Tickets Page
Operations / Logs
Settings Page
Org Management (basic)
SIP Number Config UI
Multi-Number Management
Call Analytics Dashboard
2. Media Plane (AI Brain)
Python + vLLM (Qwen3-8B) + LiveKit Agents + Docker | Vast.ai GPU
Built
STT (Faster-Whisper)
LLM (vLLM / Qwen3-8B)
TTS (Kokoro-82M)
Agent Orchestration
Tool Calling (RAG + Tickets)
Barge-in / Interruption
Conversation Memory (Redis)
Filler Phrases
Tenant Config Loading
SIP Session Creation
Multi-Number Org Routing
Concurrent Call Handling
3. RAG Plane
Python + Pinecone + Docling | Vast.ai
Built
PDF Ingestion
OCR (Tesseract)
Chunking + Embedding
Vector Search (Pinecone)
Namespace Isolation
4. Customer DB Plane
Python + MySQL + WireGuard | Vast.ai
Built
External DB Connect
Field Mapping
AES-256 Encryption
Paginated Queries
WireGuard Tunnel
5. Telephony Layer (SIP / PBX Integration)
LiveKit SIP + Yeastar PBX + SIP Trunks | VPS 72.62.70.61
Needs Work
LiveKit SIP Server (running)
Basic SIP Trunk (Yeastar)
Hardcoded dial 1000 routing
sip_numbers table (exists, basic)
sip_trunks table (exists, basic)
Dynamic number-to-org routing
Multi-number per tenant
Concurrent call support
Dashboard SIP config UI
LiveKit SIP dispatch rules (dynamic)
IP whitelisting / security
Connected Planes Ecosystem
Telephony SIP Trunking AI Brain Media Plane STT / LLM / TTS User Dashboard Admin & Tenant RAG Knowledge Base Customer DB External Tools
PBX and SIP: The Simple Explanation for Developers

What is a PBX?

Think of a PBX as a router for phone calls, just like your home WiFi router sends internet traffic to the right device.

Developer Analogy: A PBX is like an API Gateway (like Nginx or Kong). When someone calls a phone number, the PBX looks at its routing rules and decides where to send the call. Just like Nginx reads the URL path and forwards to the right backend server.

The client (our stakeholder) owns a Yeastar P550 PBX. It is a physical box in their office that manages all their phone lines. Businesses will get phone numbers from the client. When a customer dials one of those numbers, the Yeastar PBX forwards the call to our AI voice agent.

What is a SIP Trunk?

SIP (Session Initiation Protocol) is the language phones use to talk to each other. A SIP trunk is like an HTTP connection between two servers.

Developer Analogy: Imagine your Next.js app needs to talk to an external API. You configure the base URL and port. A SIP trunk is the same thing but for phone calls. The Yeastar PBX is configured with our server's IP (72.62.70.61) and port (15060). When a call comes in, it sends an INVITE (like an HTTP POST request) to that address.

We currently have a "Peer Trunk" set up. This means: no username/password login needed. The PBX just sends calls directly to our IP address. Trust is based on IP, not on credentials.

What is the INVITE / Call-ID / SDP?

When a call starts, the PBX sends a SIP INVITE message. Think of it as the phone call equivalent of a new HTTP request.

Developer Analogy:

INVITE = POST request saying "I want to start a call"
Call-ID = A unique request ID (like a UUID in your API). Each call gets its own Call-ID so the system knows which call is which, even when 10 calls happen at the same time.
SDP (Session Description Protocol) = The request body / payload. It contains: "I want to send audio using codec X, send it to my IP on port Y."
200 OK response = Our server's response saying: "Cool, I accept. Send YOUR audio to MY IP on port Z."
BYE = DELETE request. "This call is over, hang up."

What is RTP?

After the SIP "handshake" (INVITE + 200 OK), the actual voice audio flows over a different protocol called RTP (Real-time Transport Protocol).

Developer Analogy: SIP is like the REST API that sets up the call. RTP is like a WebSocket that carries the actual audio data. SIP negotiates "where" to send audio (IP + port). Then RTP streams the audio to that address using UDP packets.

Each active call needs its own RTP port. This is why the client's message mentions an "RTP port pool" (like 20000 to 20500). Each new call grabs a free port, so the audio streams never mix together.

How Does This Map to Our System?

Here is the good news: LiveKit already handles most of the SIP complexity. LiveKit SIP service acts as our SIP server. It:

1. Listens for INVITE messages on port 15060
2. Automatically manages Call-IDs and RTP ports
3. Creates a LiveKit "Room" for each call
4. Converts phone audio (G.711) to WebRTC audio (Opus)
5. Our Python agent joins the Room and talks to the caller

Key Insight
The client's message about RTP port pools and session management is describing what a SIP server must do internally. LiveKit already does this for us. What WE need to build is the routing logic: when a call comes in, figure out WHICH organization it belongs to, load THAT org's config, and connect THAT org's AI agent.

What About Concurrent Calls?

The client says one business can receive multiple calls at the same time. This is handled at two levels:

Level 1 (PBX side, client handles): The Yeastar PBX sends each call as a separate INVITE with a unique Call-ID. It does NOT confuse them. This is automatic.

Level 2 (Our side, we handle): LiveKit creates a separate Room for each call. Each Room gets its own agent instance. Currently, our session_gate.py limits to ONE active session per org per GPU. This is the bottleneck we need to fix.

The Current "Hardcoded 1000" Problem

Right now, our LiveKit SIP is set up with a single rule: "any call to number 1000 goes to the agent." This was fine for testing, but for production multi-tenant:

Problem: If Business A's customer calls +39-055-1234567 and Business B's customer calls +39-055-7654321, both numbers hit our server. But how does our agent know which business's knowledge base and customer database to use?

Solution: We need to look at the "called number" (the number the customer dialed) in the SIP INVITE, look it up in our sip_numbers table, find the matching organization_id, and load that org's config. This is the main routing problem we need to solve.

Complete Call Flow: End to End
CUSTOMER YEASTAR PBX LIVEKIT SIP OUR AGENT SUPABASE DB Dials +39-055-1001 SIP INVITE (called=1001) + unique Call-ID + SDP Creates LiveKit Room Dispatch agent to Room room metadata has called_number lookup_org_by_sip_number NEW org_id + number_config get_tenant_config(org_id) prompts, language, tools create_sip_session(org_id) Agent answers call Two-way audio (RTP via PBX, WebRTC via LiveKit) RAG + Customer DB scoped to org_id namespace Hangs up SIP BYE Room closes Session ended + log
Reading This Diagram
Yellow arrows = PBX side (client's system, not our code).
Blue arrows = LiveKit handles automatically.
Red arrows = NEW code we need to write (the number-to-org lookup).
Green arrows = Exists but needs updating for multi-tenant routing.
Purple = Active call (already working).
Developer Task Breakdown: What to Build Next
Summary
The core AI agent (STT, LLM, TTS, RAG, Customer DB) is done. What remains is making the SIP telephony layer multi-tenant aware and building the dashboard UI for managing phone numbers. Click any task to expand details.
Phase 1: SIP Multi-Tenant Routing (Backend, High Priority)
1. Update LiveKit SIP Dispatch Rules for Dynamic Routing
P0
What: Currently LiveKit has a single dispatch rule matching only "1000". We need to change this so it accepts ANY called number from the Yeastar PBX and creates a room with that number in the metadata.

Where: LiveKit SIP configuration (likely via LiveKit API or config YAML). Check docs/Yeastar-setup.md for the current config.

How: Instead of matching called_number: "1000", configure a wildcard or regex match. The dispatch rule should put the called_number and caller_number into the Room metadata so the agent can read it.

Simple example: Think of it like changing a route from /api/agent/1000 to /api/agent/:phoneNumber. The number becomes a variable, not a hardcoded value.

Reference: LiveKit SIP dispatch rules docs: https://docs.livekit.io/agents/sip/
2. Build Number-to-Organization Lookup in Agent
P0
What: When the agent joins a SIP room, it reads the called_number from room metadata, queries sip_numbers table, and gets the organization_id.

Where: services/supabase_client.py has lookup_org_by_sip_number() already. Check if it works properly and returns the right data including any per-number config overrides.

Where (agent): agent.py and voice_agent.py. The agent's entrypoint function needs to: (1) check if it's a SIP call, (2) extract called_number, (3) call lookup, (4) use that org_id for everything.

Simple example: Like a middleware in Express.js that reads the API key from the request header, looks up the tenant, and attaches req.tenant for all downstream handlers.
3. Remove Single-Session Gate for SIP Calls
P0
What: services/session_gate.py currently enforces ONE active session per organization per GPU. For SIP calls, we need to allow multiple concurrent calls per org (a restaurant might get 5 calls at once).

How: Change the gate from "1 session per org" to "N sessions per org" with a configurable limit. Or better: track per-org concurrent count and reject new calls with a SIP 486 (Busy) when the limit is reached.

Consideration: GPU memory is limited. Each concurrent call needs its own STT + LLM context + TTS. For a single RTX 3090, maybe 3 to 5 concurrent sessions max. This should be configurable per org or per GPU.
4. Per-Number Config Override Loading
P1
What: When a call comes to number X, the agent should: (1) load org-level config from agent_instances, (2) check if the sip_numbers row has a config_override JSONB, (3) merge the override on top of the org config.

Where: services/tenant_config.py. The TenantConfig dataclass needs a merge method that accepts per-number overrides.

Why: Business has number 1001 for orders (greeting: "Ready to take your order") and 1002 for support (greeting: "How can I help?"). Same org, different agent behavior.
Phase 2: Database Schema Updates
5. Enhance sip_numbers Table
P0
Current state: sip_numbers table exists with basic columns (phone_number, organization_id, active).

New columns needed:
- label (text) = Human-friendly name like "Sales Line"
- config_override (jsonb) = Per-number agent config overrides
- assigned_by (uuid, nullable) = Who assigned this number (for audit)
- assigned_at (timestamptz) = When it was assigned
- max_concurrent_calls (int, default 1) = How many simultaneous calls this number can handle

Index: Add unique index on phone_number (one number can only belong to one org at a time).
RLS: Update Row Level Security policies so tenants can only see their own numbers.
6. Add Concurrent Sessions Tracking Table
P1
What: Create an active_calls table (or use Redis) to track which calls are currently active per org and per number.

Columns:
- id (uuid)
- session_id (fk to sessions)
- organization_id (fk to organizations)
- sip_number_id (fk to sip_numbers)
- started_at (timestamptz)
- call_id_sip (text) = The SIP Call-ID for debugging

Purpose: Quick COUNT query to check concurrent calls before accepting a new call. Rows are deleted when calls end.
Phase 3: Dashboard UI (Frontend)
7. SIP Numbers Management Page
P0
What: A new page or section in Settings where the business can see their assigned phone numbers and configure each one.

Features:
- Table showing: phone number, label, status (active/inactive), concurrent call limit
- Click a number to edit: label, greeting override, language override, system prompt override
- Show live call count (how many active calls on this number right now)
- Toggle active/inactive

Note: The actual number ASSIGNMENT is done by the client (stakeholder) via an admin panel or API. The business tenant can only VIEW and CONFIGURE their assigned numbers, not create new ones.

Where: src/app/(dashboard)/settings/ or a new src/app/(dashboard)/phone-numbers/ route.
8. Admin Panel: Number Assignment (for Client/Stakeholder)
P1
What: A super-admin interface where the client (who owns the PBX) can:
- Add new phone numbers to the system
- Assign numbers to organizations
- Reassign numbers between organizations
- See which numbers are in use across all tenants

Who uses this: Only the client/stakeholder, not the business tenants.

Consideration: Could be a separate admin route (e.g., /admin/numbers) or an API-only feature initially.
9. Call Analytics / Live Call View
P2
What: The Operations page should show:
- Currently active calls (live count) per number
- Call history with caller number, duration, tickets created
- Simple analytics: calls per day, average duration, busiest hours

Where: Enhance existing src/app/(dashboard)/operations/page.tsx
Phase 4: Security & Production Readiness
10. SIP IP Whitelisting
P0
What: Currently, anyone who knows our IP (72.62.70.61:15060) can send SIP calls. We need to restrict this to ONLY the Yeastar PBX's IP address.

How: Two approaches:
- Firewall level: iptables/ufw rule on the VPS to DROP any UDP to port 15060 that does not come from the Yeastar's public IP
- LiveKit level: Configure the SIP trunk's allowed_addresses to only accept the PBX IP

The client's message explicitly mentions this: "The bot's SIP listener should silently drop any UDP packets on port 15060 that do not originate from the Yeastar PBX's designated IP address."
11. Graceful Call Rejection (Capacity Limits)
P1
What: If the GPU is at capacity (all concurrent slots used), new calls should get a proper SIP rejection (486 Busy Here or 503 Service Unavailable) instead of just hanging.

How: Before the agent joins a Room, check concurrent call count. If over limit, do not join and let LiveKit handle the SIP rejection. Or configure LiveKit dispatch rules with a max participant limit.

Also needed: Ghost session cleanup. If a call drops without a SIP BYE (network failure), detect the silence after 10 to 15 seconds and force-close the session to free the slot.
12. Firewall Rules for RTP Port Range
P1
What: Ensure the VPS firewall allows the full RTP media port range.

Current: Ports 10000 to 12000 are open for RTP. If we increase concurrent calls, verify this range is sufficient.

Rule of thumb: Each call uses 2 RTP ports (audio send + receive). With 50 concurrent calls, we need at least 100 ports. 10000 to 12000 gives us 2000 ports, so plenty for now.