Vocale AI Architecture Map

Last updated: April 11, 2026
Current Architecture (What Exists Today)
1. User Plane (Dashboard)
Next.js 16 + React 19 + Supabase + LiveKit Client | Vercel
Built
Auth (NextAuth)
Voice Agent Page
Knowledge Base (PDF upload)
Customer DB Connect
Field Mapping Editor
Tickets Page
Operations / Logs
Settings Page
Org Management (basic)
SIP Number Config UI
Multi-Number Management
Call Analytics Dashboard
2. Media Plane (AI Brain)
Python + vLLM (Qwen3-8B) + LiveKit Agents + Docker | Vast.ai GPU
Built
STT (Faster-Whisper)
LLM (vLLM / Qwen3-8B)
TTS (Kokoro-82M)
Agent Orchestration
Tool Calling (RAG + Tickets)
Barge-in / Interruption
Conversation Memory (Redis)
Filler Phrases
Tenant Config Loading
SIP Session Creation
Multi-Number Org Routing
Concurrent Call Handling
3. RAG Plane
Python + Pinecone + Docling | Vast.ai
Built
PDF Ingestion
OCR (Tesseract)
Chunking + Embedding
Vector Search (Pinecone)
Namespace Isolation
4. Customer DB Plane
Python + MySQL + WireGuard | Vast.ai
Built
External DB Connect
Field Mapping
AES-256 Encryption
Paginated Queries
WireGuard Tunnel
5. Telephony Layer (SIP / PBX Integration)
LiveKit SIP + Yeastar PBX + SIP Trunks | VPS 72.62.70.61
Needs Work
LiveKit SIP Server (running)
Basic SIP Trunk (Yeastar)
Hardcoded dial 1000 routing
SIP Numbers Directory
SIP Trunk Configuration
Dynamic number-to-org routing
Multi-number per tenant
Concurrent call support
Dashboard SIP config UI
LiveKit SIP dispatch rules (dynamic)
IP whitelisting / security
6. Admin Plane (Internal Operations)
Separate Repository  ·  Next.js + Supabase | Internal Access Only
In Progress
Maintained in a separate private repository. Admins access this plane independently from the client-facing dashboard. All features below are actively being developed.
Project Documentation
Admin Dashboard UI
Agent Activity Logs
Agent Config Defaults
Agent Status Tracking
Consent Records
Conversation History
Customer Database Logs
Customer Field Mappings
Customer Directory
Data Export Requests
External DB Connections
PBX Configurations
User Profiles
Organization Records
SIP Trunk Management
Multi-Tenant PBX & SIP Config
Email Automation (Multi-Tenant)
Customer Data Upload (SQL / CSV / Manual)
Connected Planes — Interactive Map

Click any plane node below to see what it does and how it connects to the rest of the system.

Telephony SIP Trunking AI Brain Media Plane Speech → Thinking → Voice User Dashboard Client & Tenant Control Panel Knowledge Base (RAG) Customer Database
Connected Planes Ecosystem
Telephony SIP Trunking AI Brain Media Plane STT / LLM / TTS User Dashboard Admin & Tenant RAG Knowledge Base Customer DB External Tools
PBX and SIP: The Simple Explanation for Developers

What is a PBX?

Think of a PBX as a router for phone calls, just like your home WiFi router sends internet traffic to the right device.

Developer Analogy: A PBX is like an API Gateway (like Nginx or Kong). When someone calls a phone number, the PBX looks at its routing rules and decides where to send the call. Just like Nginx reads the URL path and forwards to the right backend server.

The client (our stakeholder) owns a Yeastar P550 PBX. It is a physical box in their office that manages all their phone lines. Businesses will get phone numbers from the client. When a customer dials one of those numbers, the Yeastar PBX forwards the call to our AI voice agent.

What is a SIP Trunk?

SIP (Session Initiation Protocol) is the language phones use to talk to each other. A SIP trunk is like an HTTP connection between two servers.

Developer Analogy: Imagine your Next.js app needs to talk to an external API. You configure the base URL and port. A SIP trunk is the same thing but for phone calls. The Yeastar PBX is configured with our server's IP (72.62.70.61) and port (15060). When a call comes in, it sends an INVITE (like an HTTP POST request) to that address.

We currently have a "Peer Trunk" set up. This means: no username/password login needed. The PBX just sends calls directly to our IP address. Trust is based on IP, not on credentials.

What is the INVITE / Call-ID / SDP?

When a call starts, the PBX sends a SIP INVITE message. Think of it as the phone call equivalent of a new HTTP request.

Developer Analogy:

INVITE = POST request saying "I want to start a call"
Call-ID = A unique request ID (like a UUID in your API). Each call gets its own Call-ID so the system knows which call is which, even when 10 calls happen at the same time.
SDP (Session Description Protocol) = The request body / payload. It contains: "I want to send audio using codec X, send it to my IP on port Y."
200 OK response = Our server's response saying: "Cool, I accept. Send YOUR audio to MY IP on port Z."
BYE = DELETE request. "This call is over, hang up."

What is RTP?

After the SIP "handshake" (INVITE + 200 OK), the actual voice audio flows over a different protocol called RTP (Real-time Transport Protocol).

Developer Analogy: SIP is like the REST API that sets up the call. RTP is like a WebSocket that carries the actual audio data. SIP negotiates "where" to send audio (IP + port). Then RTP streams the audio to that address using UDP packets.

Each active call needs its own RTP port. This is why the client's message mentions an "RTP port pool" (like 20000 to 20500). Each new call grabs a free port, so the audio streams never mix together.

How Does This Map to Our System?

Here is the good news: LiveKit already handles most of the SIP complexity. LiveKit SIP service acts as our SIP server. It:

1. Listens for INVITE messages on port 15060
2. Automatically manages Call-IDs and RTP ports
3. Creates a LiveKit "Room" for each call
4. Converts phone audio (G.711) to WebRTC audio (Opus)
5. Our Python agent joins the Room and talks to the caller

Key Insight
The client's message about RTP port pools and session management is describing what a SIP server must do internally. LiveKit already does this for us. What WE need to build is the routing logic: when a call comes in, figure out WHICH organization it belongs to, load THAT org's config, and connect THAT org's AI agent.

What About Concurrent Calls?

The client says one business can receive multiple calls at the same time. This is handled at two levels:

Level 1 (PBX side, client handles): The Yeastar PBX sends each call as a separate INVITE with a unique Call-ID. It does NOT confuse them. This is automatic.

Level 2 (Our side, we handle): LiveKit creates a separate Room for each call. Each Room gets its own agent instance. Currently, our session_gate.py limits to ONE active session per org per GPU. This is the bottleneck we need to fix.

The Current "Hardcoded 1000" Problem

Right now, our LiveKit SIP is set up with a single rule: "any call to number 1000 goes to the agent." This was fine for testing, but for production multi-tenant:

Problem: If Business A's customer calls +39-055-1234567 and Business B's customer calls +39-055-7654321, both numbers hit our server. But how does our agent know which business's knowledge base and customer database to use?

Solution: We need to look at the "called number" (the number the customer dialed) in the SIP INVITE, look it up in our sip_numbers table, find the matching organization_id, and load that org's config. This is the main routing problem we need to solve.

Complete Call Flow: End to End
CUSTOMER YEASTAR PBX LIVEKIT SIP OUR AGENT SUPABASE DB Dials +39-055-1001 Incoming Call Request + unique call ID + audio setup info Creates LiveKit Room Start AI Agent in Room room carries the dialed number Which business owns this number? NEW Business ID + number settings Load this business's settings Agent script, language, tools Start call session for this business Agent answers call Two-way audio (RTP via PBX, WebRTC via LiveKit) Knowledge Base + Customer Data only this business's data Hangs up SIP BYE Room closes Session ended + log
Reading This Diagram
Yellow arrows = The client's office phone system — not our code.
Blue arrows = Handled automatically by our call routing software.
Red arrows = New logic we need to build (matching a phone number to the right business).
Green arrows = Already exists but needs updating to support multiple businesses.
Purple = The live voice conversation (already working).
PBX & SIP — Simply Explained

What is a PBX?

Think of a PBX as a traffic controller for phone calls — just like a receptionist at a front desk who decides which department to transfer a caller to.

Simple analogy: Imagine your building has one main phone number, but 20 different departments inside. The PBX is the system that answers the main number and routes the call to the right desk.

Our client owns a Yeastar P550 — a physical phone routing box in their office. Businesses get phone numbers from the client. When a customer dials one of those numbers, the Yeastar routes the call to our AI voice agent.

What is a SIP Trunk?

A SIP Trunk is the digital phone line that connects two systems so they can exchange calls. Instead of a physical cable, it's a network connection between the client's office phone system and our server.

Simple analogy: Think of it like the address and phone number of a business — our server has a public address (IP 72.62.70.61, port 15060). The Yeastar is configured to send calls to that address whenever one comes in.

We use a "Peer Trunk" — meaning the PBX just sends calls directly to our server's address. No login required. Trust is based on the IP address of the sender.

What happens when a call starts?

When a customer dials a number, the phone system sends a "start call" request to our server. This request includes everything needed to begin the conversation.

Breaking it down:

"Start Call" Request (INVITE) = The system saying "someone wants to talk, please pick up"
Unique Call ID = Like a ticket number — every call gets its own ID so the system never confuses two simultaneous calls
Audio Setup Info (SDP) = Technical details like "send the voice audio to this address on this channel"
"OK, I accept" Response (200 OK) = Our server saying "got it, call connected, sending audio back"
"End Call" Signal (BYE) = Either side saying "the conversation is over, hang up"

How does actual voice audio travel?

Once the call is accepted, the live voice audio flows through a separate channel specifically designed for real-time audio streaming.

Simple analogy: The "start call" process is like booking a phone call. The audio streaming is the actual conversation itself — it needs its own dedicated line so it isn't delayed or interrupted.

Each active call needs its own dedicated channel. This is why multiple calls can happen at the same time without the voices mixing together — each gets its own private audio lane.

What does our system actually do with a call?

Here is the good news: most of the technical phone infrastructure is handled for us automatically. Our call routing software (LiveKit) acts as the SIP server. It:

1. Listens for incoming calls on our server
2. Automatically assigns each call a unique tracking ID
3. Creates a private virtual "room" for each call
4. Converts the phone audio format to the format our AI can process
5. Our AI agent joins the room and begins the conversation with the caller

The Key Remaining Work
The technical call-handling is already in place. What we still need to build is the routing logic: when a call comes in, figure out which business it belongs to (based on the number dialed), load that business's agent settings, and connect the right AI. This is the main engineering task ahead.

How do multiple businesses share the same system?

This is the core challenge — the same server receives calls for many different businesses. Each call must be handled with the right agent, knowledge base, and customer data for that specific business.

What the client's phone system does (automatic): When a call comes in, the Yeastar sends each call separately with a unique ID. It never confuses two calls — this is handled for us.

What our system must do (the part we're building): We create a separate virtual room for each call. Each room gets its own AI agent instance. Currently, our system only allows one active call per business at a time — this is the limit we need to remove.

The "Single Number" Problem We're Fixing

Right now, our system is configured to only accept calls from one specific test number. This worked for development, but for a real product with multiple business clients:

The problem: If Business A's customer calls +39-055-1234567 and Business B's customer calls +39-055-7654321, both calls arrive at our server. But our agent currently has no way to tell which business the call is for.

The fix: We need to look at which number the customer dialed, match it to the correct business in our database, and load that business's agent, knowledge base, and customer data. This is the main routing problem we are solving.

Developer Task Breakdown: What to Build Next
Summary
The core AI agent (STT, LLM, TTS, RAG, Customer DB) is done. What remains is making the SIP telephony layer multi-tenant aware and building the dashboard UI for managing phone numbers. Click any task to expand details.
Phase 1: SIP Multi-Tenant Routing (Backend, High Priority)
1. Update LiveKit SIP Dispatch Rules for Dynamic Routing
P0
What: Currently LiveKit has a single dispatch rule matching only "1000". We need to change this so it accepts ANY called number from the Yeastar PBX and creates a room with that number in the metadata.

Where: LiveKit SIP configuration (likely via LiveKit API or config YAML). Check docs/Yeastar-setup.md for the current config.

How: Instead of matching called_number: "1000", configure a wildcard or regex match. The dispatch rule should put the called_number and caller_number into the Room metadata so the agent can read it.

Simple example: Think of it like changing a route from /api/agent/1000 to /api/agent/:phoneNumber. The number becomes a variable, not a hardcoded value.

Reference: LiveKit SIP dispatch rules docs: https://docs.livekit.io/agents/sip/
2. Build Number-to-Organization Lookup in Agent
P0
What: When the agent joins a SIP room, it reads the called_number from room metadata, queries sip_numbers table, and gets the organization_id.

Where: services/supabase_client.py has lookup_org_by_sip_number() already. Check if it works properly and returns the right data including any per-number config overrides.

Where (agent): agent.py and voice_agent.py. The agent's entrypoint function needs to: (1) check if it's a SIP call, (2) extract called_number, (3) call lookup, (4) use that org_id for everything.

Simple example: Like a middleware in Express.js that reads the API key from the request header, looks up the tenant, and attaches req.tenant for all downstream handlers.
3. Remove Single-Session Gate for SIP Calls
P0
What: services/session_gate.py currently enforces ONE active session per organization per GPU. For SIP calls, we need to allow multiple concurrent calls per org (a restaurant might get 5 calls at once).

How: Change the gate from "1 session per org" to "N sessions per org" with a configurable limit. Or better: track per-org concurrent count and reject new calls with a SIP 486 (Busy) when the limit is reached.

Consideration: GPU memory is limited. Each concurrent call needs its own STT + LLM context + TTS. For a single RTX 3090, maybe 3 to 5 concurrent sessions max. This should be configurable per org or per GPU.
4. Per-Number Config Override Loading
P1
What: When a call comes to number X, the agent should: (1) load org-level config from agent_instances, (2) check if the sip_numbers row has a config_override JSONB, (3) merge the override on top of the org config.

Where: services/tenant_config.py. The TenantConfig dataclass needs a merge method that accepts per-number overrides.

Why: Business has number 1001 for orders (greeting: "Ready to take your order") and 1002 for support (greeting: "How can I help?"). Same org, different agent behavior.
Phase 2: Database Schema Updates
5. Enhance sip_numbers Table
P0
Current state: sip_numbers table exists with basic columns (phone_number, organization_id, active).

New columns needed:
- label (text) = Human-friendly name like "Sales Line"
- config_override (jsonb) = Per-number agent config overrides
- assigned_by (uuid, nullable) = Who assigned this number (for audit)
- assigned_at (timestamptz) = When it was assigned
- max_concurrent_calls (int, default 1) = How many simultaneous calls this number can handle

Index: Add unique index on phone_number (one number can only belong to one org at a time).
RLS: Update Row Level Security policies so tenants can only see their own numbers.
6. Add Concurrent Sessions Tracking Table
P1
What: Create an active_calls table (or use Redis) to track which calls are currently active per org and per number.

Columns:
- id (uuid)
- session_id (fk to sessions)
- organization_id (fk to organizations)
- sip_number_id (fk to sip_numbers)
- started_at (timestamptz)
- call_id_sip (text) = The SIP Call-ID for debugging

Purpose: Quick COUNT query to check concurrent calls before accepting a new call. Rows are deleted when calls end.
Phase 3: Dashboard UI (Frontend)
7. SIP Numbers Management Page
P0
What: A new page or section in Settings where the business can see their assigned phone numbers and configure each one.

Features:
- Table showing: phone number, label, status (active/inactive), concurrent call limit
- Click a number to edit: label, greeting override, language override, system prompt override
- Show live call count (how many active calls on this number right now)
- Toggle active/inactive

Note: The actual number ASSIGNMENT is done by the client (stakeholder) via an admin panel or API. The business tenant can only VIEW and CONFIGURE their assigned numbers, not create new ones.

Where: src/app/(dashboard)/settings/ or a new src/app/(dashboard)/phone-numbers/ route.
8. Admin Panel: Number Assignment (for Client/Stakeholder)
P1
What: A super-admin interface where the client (who owns the PBX) can:
- Add new phone numbers to the system
- Assign numbers to organizations
- Reassign numbers between organizations
- See which numbers are in use across all tenants

Who uses this: Only the client/stakeholder, not the business tenants.

Consideration: Could be a separate admin route (e.g., /admin/numbers) or an API-only feature initially.
9. Call Analytics / Live Call View
P2
What: The Operations page should show:
- Currently active calls (live count) per number
- Call history with caller number, duration, tickets created
- Simple analytics: calls per day, average duration, busiest hours

Where: Enhance existing src/app/(dashboard)/operations/page.tsx
Phase 4: Security & Production Readiness
10. SIP IP Whitelisting
P0
What: Currently, anyone who knows our IP (72.62.70.61:15060) can send SIP calls. We need to restrict this to ONLY the Yeastar PBX's IP address.

How: Two approaches:
- Firewall level: iptables/ufw rule on the VPS to DROP any UDP to port 15060 that does not come from the Yeastar's public IP
- LiveKit level: Configure the SIP trunk's allowed_addresses to only accept the PBX IP

The client's message explicitly mentions this: "The bot's SIP listener should silently drop any UDP packets on port 15060 that do not originate from the Yeastar PBX's designated IP address."
11. Graceful Call Rejection (Capacity Limits)
P1
What: If the GPU is at capacity (all concurrent slots used), new calls should get a proper SIP rejection (486 Busy Here or 503 Service Unavailable) instead of just hanging.

How: Before the agent joins a Room, check concurrent call count. If over limit, do not join and let LiveKit handle the SIP rejection. Or configure LiveKit dispatch rules with a max participant limit.

Also needed: Ghost session cleanup. If a call drops without a SIP BYE (network failure), detect the silence after 10 to 15 seconds and force-close the session to free the slot.
12. Firewall Rules for RTP Port Range
P1
What: Ensure the VPS firewall allows the full RTP media port range.

Current: Ports 10000 to 12000 are open for RTP. If we increase concurrent calls, verify this range is sufficient.

Rule of thumb: Each call uses 2 RTP ports (audio send + receive). With 50 concurrent calls, we need at least 100 ports. 10000 to 12000 gives us 2000 ports, so plenty for now.
Active Development

Features Currently In Progress

These items are actively being worked on by our development team. Updates will be reflected here as work advances.

3 In Progress
2 New & In Progress
Last updated: April 13, 2026
1
Project Documentation
Architecture maps, developer guides, API references & onboarding materials
In Progress
Comprehensive documentation covering the full Vocale AI system architecture, including plane-by-plane breakdowns, API endpoint references, deployment runbooks, and a developer onboarding guide. This page you are currently viewing is part of that effort.
Architecture Map API Reference Developer Onboarding Deployment Runbook
2
Admin Dashboard UI
Internal admin interface for managing all platform data tables and configurations
New
A full-featured admin dashboard providing CRUD interfaces and visibility into all core platform tables. Admins will be able to inspect, manage and audit data across every tenant from a single secure panel.
agent_activity_log agent_config_defaults agent_state consent_records conversation_turns customer_db_logs customer_field_mappings customers data_export_requests external_db_connections pbx_configurations profiles organizations sip_trunks
3
Multi-Tenant PBX & SIP Configuration
Per-tenant SIP trunk provisioning, PBX config management & dynamic routing
In Progress
Building the full multi-tenant telephony layer: dynamic SIP dispatch rules, per-organization trunk assignments, dashboard configuration UI for PBX settings, IP whitelisting, and concurrent call routing. This resolves the current single-tenant hardcoded routing and unlocks true SaaS-grade telephony.
Dynamic SIP Routing Per-Org Trunk Assignment PBX Config UI IP Whitelisting Concurrent Call Handling
4
Email Automation — Multi-Tenant
Automated email workflows, triggers & delivery pipelines scoped per organization
New
A multi-tenant email automation module allowing each organization to configure triggered emails (e.g., post-call summaries, follow-ups, ticket notifications). Includes template management, SMTP/provider integration, delivery tracking, and per-tenant isolation of email rules and logs.
Email Triggers Template Management SMTP / Provider Integration Delivery Tracking Per-Tenant Isolation
5
Customer Data Upload — SQL, CSV & Manual
SaaS dashboard tools to import & manage customer records across all input formats
New
Enabling tenants to populate their customer data through three channels directly from the SaaS dashboard: SQL query import from external databases, CSV file upload with column mapping, and a manual entry form. Includes validation, field mapping, conflict resolution, and audit logs for all import operations.
SQL Import CSV Upload Manual Entry Form Column Mapping Import Audit Logs