Vocale AI - Architecture Map & Developer Guide

Current Architecture (What Exists Today)

1. User Plane (Dashboard)

Next.js 16 + React 19 + Supabase + LiveKit Client | Vercel

Built

Auth (NextAuth)

Voice Agent Page

Knowledge Base (PDF upload)

Customer DB Connect

Field Mapping Editor

Tickets Page

Operations / Logs

Settings Page

Org Management (basic)

SIP Number Config UI

Multi-Number Management

Call Analytics Dashboard

2. Media Plane (AI Brain)

Python + vLLM (Qwen3-8B) + LiveKit Agents + Docker | Vast.ai GPU

Built

STT (Faster-Whisper)

LLM (vLLM / Qwen3-8B)

TTS (Kokoro-82M)

Agent Orchestration

Tool Calling (RAG + Tickets)

Barge-in / Interruption

Conversation Memory (Redis)

Filler Phrases

Tenant Config Loading

SIP Session Creation

Multi-Number Org Routing

Concurrent Call Handling

3. RAG Plane

Python + Pinecone + Docling | Vast.ai

Built

PDF Ingestion

OCR (Tesseract)

Chunking + Embedding

Vector Search (Pinecone)

Namespace Isolation

4. Customer DB Plane

Python + MySQL + WireGuard | Vast.ai

Built

External DB Connect

Field Mapping

AES-256 Encryption

Paginated Queries

WireGuard Tunnel

5. Telephony Layer (SIP / PBX Integration)

LiveKit SIP + Yeastar PBX + SIP Trunks | VPS 72.62.70.61

Needs Work

LiveKit SIP Server (running)

Basic SIP Trunk (Yeastar)

Hardcoded dial 1000 routing

SIP Numbers Directory

SIP Trunk Configuration

Dynamic number-to-org routing

Multi-number per tenant

Concurrent call support

Dashboard SIP config UI

LiveKit SIP dispatch rules (dynamic)

IP whitelisting / security

6. Admin Plane (Internal Operations)

Separate Repository · Next.js + Supabase | Internal Access Only

In Progress

Maintained in a separate private repository. Admins access this plane independently from the client-facing dashboard. All features below are actively being developed.

Project Documentation

Admin Dashboard UI

Agent Activity Logs

Agent Config Defaults

Agent Status Tracking

Consent Records

Conversation History

Customer Database Logs

Customer Field Mappings

Customer Directory

Data Export Requests

External DB Connections

PBX Configurations

User Profiles

Organization Records

SIP Trunk Management

Multi-Tenant PBX & SIP Config

Email Automation (Multi-Tenant)

Customer Data Upload (SQL / CSV / Manual)

Connected Planes — Interactive Map

Click any plane node below to see what it does and how it connects to the rest of the system.

Complete Call Flow: End to End

Reading This Diagram

Yellow arrows = The client's office phone system — not our code.
Blue arrows = Handled automatically by our call routing software.
Red arrows = New logic we need to build (matching a phone number to the right business).
Green arrows = Already exists but needs updating to support multiple businesses.
Purple = The live voice conversation (already working).

PBX & SIP — Simply Explained

What is a PBX?

Think of a PBX as a traffic controller for phone calls — just like a receptionist at a front desk who decides which department to transfer a caller to.

Simple analogy: Imagine your building has one main phone number, but 20 different departments inside. The PBX is the system that answers the main number and routes the call to the right desk.

Our client owns a Yeastar P550 — a physical phone routing box in their office. Businesses get phone numbers from the client. When a customer dials one of those numbers, the Yeastar routes the call to our AI voice agent.

What is a SIP Trunk?

A SIP Trunk is the digital phone line that connects two systems so they can exchange calls. Instead of a physical cable, it's a network connection between the client's office phone system and our server.

Simple analogy: Think of it like the address and phone number of a business — our server has a public address (IP 72.62.70.61, port 15060). The Yeastar is configured to send calls to that address whenever one comes in.

We use a "Peer Trunk" — meaning the PBX just sends calls directly to our server's address. No login required. Trust is based on the IP address of the sender.

What happens when a call starts?

When a customer dials a number, the phone system sends a "start call" request to our server. This request includes everything needed to begin the conversation.

Breaking it down:

"Start Call" Request (INVITE) = The system saying "someone wants to talk, please pick up"
Unique Call ID = Like a ticket number — every call gets its own ID so the system never confuses two simultaneous calls
Audio Setup Info (SDP) = Technical details like "send the voice audio to this address on this channel"
"OK, I accept" Response (200 OK) = Our server saying "got it, call connected, sending audio back"
"End Call" Signal (BYE) = Either side saying "the conversation is over, hang up"

How does actual voice audio travel?

Once the call is accepted, the live voice audio flows through a separate channel specifically designed for real-time audio streaming.

Simple analogy: The "start call" process is like booking a phone call. The audio streaming is the actual conversation itself — it needs its own dedicated line so it isn't delayed or interrupted.

Each active call needs its own dedicated channel. This is why multiple calls can happen at the same time without the voices mixing together — each gets its own private audio lane.

What does our system actually do with a call?

Here is the good news: most of the technical phone infrastructure is handled for us automatically. Our call routing software (LiveKit) acts as the SIP server. It:

1. Listens for incoming calls on our server
2. Automatically assigns each call a unique tracking ID
3. Creates a private virtual "room" for each call
4. Converts the phone audio format to the format our AI can process
5. Our AI agent joins the room and begins the conversation with the caller

The Key Remaining Work

The technical call-handling is already in place. What we still need to build is the routing logic: when a call comes in, figure out which business it belongs to (based on the number dialed), load that business's agent settings, and connect the right AI. This is the main engineering task ahead.

How do multiple businesses share the same system?

This is the core challenge — the same server receives calls for many different businesses. Each call must be handled with the right agent, knowledge base, and customer data for that specific business.

What the client's phone system does (automatic): When a call comes in, the Yeastar sends each call separately with a unique ID. It never confuses two calls — this is handled for us.

What our system must do (the part we're building): We create a separate virtual room for each call. Each room gets its own AI agent instance. Currently, our system only allows one active call per business at a time — this is the limit we need to remove.

The "Single Number" Problem We're Fixing

Right now, our system is configured to only accept calls from one specific test number. This worked for development, but for a real product with multiple business clients:

The problem: If Business A's customer calls +39-055-1234567 and Business B's customer calls +39-055-7654321, both calls arrive at our server. But our agent currently has no way to tell which business the call is for.

The fix: We need to look at which number the customer dialed, match it to the correct business in our database, and load that business's agent, knowledge base, and customer data. This is the main routing problem we are solving.

Developer Task Breakdown: What to Build Next

Summary

The core AI agent (STT, LLM, TTS, RAG, Customer DB) is done. What remains is making the SIP telephony layer multi-tenant aware and building the dashboard UI for managing phone numbers. Click any task to expand details.

Phase 1: SIP Multi-Tenant Routing (Backend, High Priority)

1. Update LiveKit SIP Dispatch Rules for Dynamic Routing

What: Currently LiveKit has a single dispatch rule matching only "1000". We need to change this so it accepts ANY called number from the Yeastar PBX and creates a room with that number in the metadata.

Where: LiveKit SIP configuration (likely via LiveKit API or config YAML). Check docs/Yeastar-setup.md for the current config.

How: Instead of matching called_number: "1000", configure a wildcard or regex match. The dispatch rule should put the called_number and caller_number into the Room metadata so the agent can read it.

Simple example: Think of it like changing a route from /api/agent/1000 to /api/agent/:phoneNumber. The number becomes a variable, not a hardcoded value.

Reference: LiveKit SIP dispatch rules docs: https://docs.livekit.io/agents/sip/

2. Build Number-to-Organization Lookup in Agent

What: When the agent joins a SIP room, it reads the called_number from room metadata, queries sip_numbers table, and gets the organization_id.

Where: services/supabase_client.py has lookup_org_by_sip_number() already. Check if it works properly and returns the right data including any per-number config overrides.

Where (agent): agent.py and voice_agent.py. The agent's entrypoint function needs to: (1) check if it's a SIP call, (2) extract called_number, (3) call lookup, (4) use that org_id for everything.

Simple example: Like a middleware in Express.js that reads the API key from the request header, looks up the tenant, and attaches req.tenant for all downstream handlers.

3. Remove Single-Session Gate for SIP Calls

What: services/session_gate.py currently enforces ONE active session per organization per GPU. For SIP calls, we need to allow multiple concurrent calls per org (a restaurant might get 5 calls at once).

How: Change the gate from "1 session per org" to "N sessions per org" with a configurable limit. Or better: track per-org concurrent count and reject new calls with a SIP 486 (Busy) when the limit is reached.

Consideration: GPU memory is limited. Each concurrent call needs its own STT + LLM context + TTS. For a single RTX 3090, maybe 3 to 5 concurrent sessions max. This should be configurable per org or per GPU.

4. Per-Number Config Override Loading

What: When a call comes to number X, the agent should: (1) load org-level config from agent_instances, (2) check if the sip_numbers row has a config_override JSONB, (3) merge the override on top of the org config.

Where: services/tenant_config.py. The TenantConfig dataclass needs a merge method that accepts per-number overrides.

Why: Business has number 1001 for orders (greeting: "Ready to take your order") and 1002 for support (greeting: "How can I help?"). Same org, different agent behavior.

Phase 2: Database Schema Updates

5. Enhance sip_numbers Table

Current state: sip_numbers table exists with basic columns (phone_number, organization_id, active).

New columns needed:
- label (text) = Human-friendly name like "Sales Line"
- config_override (jsonb) = Per-number agent config overrides
- assigned_by (uuid, nullable) = Who assigned this number (for audit)
- assigned_at (timestamptz) = When it was assigned
- max_concurrent_calls (int, default 1) = How many simultaneous calls this number can handle

Index: Add unique index on phone_number (one number can only belong to one org at a time).
RLS: Update Row Level Security policies so tenants can only see their own numbers.

6. Add Concurrent Sessions Tracking Table

What: Create an active_calls table (or use Redis) to track which calls are currently active per org and per number.

Columns:
- id (uuid)
- session_id (fk to sessions)
- organization_id (fk to organizations)
- sip_number_id (fk to sip_numbers)
- started_at (timestamptz)
- call_id_sip (text) = The SIP Call-ID for debugging

Purpose: Quick COUNT query to check concurrent calls before accepting a new call. Rows are deleted when calls end.

Phase 3: Dashboard UI (Frontend)

7. SIP Numbers Management Page

What: A new page or section in Settings where the business can see their assigned phone numbers and configure each one.

Features:
- Table showing: phone number, label, status (active/inactive), concurrent call limit
- Click a number to edit: label, greeting override, language override, system prompt override
- Show live call count (how many active calls on this number right now)
- Toggle active/inactive

Note: The actual number ASSIGNMENT is done by the client (stakeholder) via an admin panel or API. The business tenant can only VIEW and CONFIGURE their assigned numbers, not create new ones.

Where: src/app/(dashboard)/settings/ or a new src/app/(dashboard)/phone-numbers/ route.

8. Admin Panel: Number Assignment (for Client/Stakeholder)

What: A super-admin interface where the client (who owns the PBX) can:
- Add new phone numbers to the system
- Assign numbers to organizations
- Reassign numbers between organizations
- See which numbers are in use across all tenants

Who uses this: Only the client/stakeholder, not the business tenants.

Consideration: Could be a separate admin route (e.g., /admin/numbers) or an API-only feature initially.

9. Call Analytics / Live Call View

What: The Operations page should show:
- Currently active calls (live count) per number
- Call history with caller number, duration, tickets created
- Simple analytics: calls per day, average duration, busiest hours

Where: Enhance existing src/app/(dashboard)/operations/page.tsx

Phase 4: Security & Production Readiness

10. SIP IP Whitelisting

What: Currently, anyone who knows our IP (72.62.70.61:15060) can send SIP calls. We need to restrict this to ONLY the Yeastar PBX's IP address.

How: Two approaches:
- Firewall level: iptables/ufw rule on the VPS to DROP any UDP to port 15060 that does not come from the Yeastar's public IP
- LiveKit level: Configure the SIP trunk's allowed_addresses to only accept the PBX IP

The client's message explicitly mentions this: "The bot's SIP listener should silently drop any UDP packets on port 15060 that do not originate from the Yeastar PBX's designated IP address."

11. Graceful Call Rejection (Capacity Limits)

What: If the GPU is at capacity (all concurrent slots used), new calls should get a proper SIP rejection (486 Busy Here or 503 Service Unavailable) instead of just hanging.

How: Before the agent joins a Room, check concurrent call count. If over limit, do not join and let LiveKit handle the SIP rejection. Or configure LiveKit dispatch rules with a max participant limit.

Also needed: Ghost session cleanup. If a call drops without a SIP BYE (network failure), detect the silence after 10 to 15 seconds and force-close the session to free the slot.

12. Firewall Rules for RTP Port Range

What: Ensure the VPS firewall allows the full RTP media port range.

Current: Ports 10000 to 12000 are open for RTP. If we increase concurrent calls, verify this range is sufficient.

Rule of thumb: Each call uses 2 RTP ports (audio send + receive). With 50 concurrent calls, we need at least 100 ports. 10000 to 12000 gives us 2000 ports, so plenty for now.

Active Development

Features Currently In Progress

These items are actively being worked on by our development team. Updates will be reflected here as work advances.

3 In Progress

2 New & In Progress

Last updated: April 13, 2026

Project Documentation

Architecture maps, developer guides, API references & onboarding materials

In Progress

Comprehensive documentation covering the full Vocale AI system architecture, including plane-by-plane breakdowns, API endpoint references, deployment runbooks, and a developer onboarding guide. This page you are currently viewing is part of that effort.

Architecture Map API Reference Developer Onboarding Deployment Runbook

Admin Dashboard UI

Internal admin interface for managing all platform data tables and configurations

New

A full-featured admin dashboard providing CRUD interfaces and visibility into all core platform tables. Admins will be able to inspect, manage and audit data across every tenant from a single secure panel.

agent_activity_log agent_config_defaults agent_state consent_records conversation_turns customer_db_logs customer_field_mappings customers data_export_requests external_db_connections pbx_configurations profiles organizations sip_trunks

Multi-Tenant PBX & SIP Configuration

Per-tenant SIP trunk provisioning, PBX config management & dynamic routing

In Progress

Building the full multi-tenant telephony layer: dynamic SIP dispatch rules, per-organization trunk assignments, dashboard configuration UI for PBX settings, IP whitelisting, and concurrent call routing. This resolves the current single-tenant hardcoded routing and unlocks true SaaS-grade telephony.

Dynamic SIP Routing Per-Org Trunk Assignment PBX Config UI IP Whitelisting Concurrent Call Handling

Email Automation — Multi-Tenant

Automated email workflows, triggers & delivery pipelines scoped per organization

New

A multi-tenant email automation module allowing each organization to configure triggered emails (e.g., post-call summaries, follow-ups, ticket notifications). Includes template management, SMTP/provider integration, delivery tracking, and per-tenant isolation of email rules and logs.

Email Triggers Template Management SMTP / Provider Integration Delivery Tracking Per-Tenant Isolation

Customer Data Upload — SQL, CSV & Manual

SaaS dashboard tools to import & manage customer records across all input formats

New

Enabling tenants to populate their customer data through three channels directly from the SaaS dashboard: SQL query import from external databases, CSV file upload with column mapping, and a manual entry form. Includes validation, field mapping, conflict resolution, and audit logs for all import operations.

SQL Import CSV Upload Manual Entry Form Column Mapping Import Audit Logs

Vocale AI Architecture Map

Telephony Layer — The Phone Gateway

User Dashboard — The Control Panel

AI Brain — Where the Thinking Happens

Knowledge Base — The Agent's Reference Library

Customer Database — Live Business Data

Telephony Layer (SIP Trunking)

User Plane (Next.js Dashboard)

AI Brain (Media Plane)

RAG Plane (Knowledge Base)

Customer DB Plane

What is a PBX?

What is a SIP Trunk?

What is the INVITE / Call-ID / SDP?

What is RTP?

How Does This Map to Our System?

What About Concurrent Calls?

The Current "Hardcoded 1000" Problem

What is a PBX?

What is a SIP Trunk?

What happens when a call starts?

How does actual voice audio travel?

What does our system actually do with a call?

How do multiple businesses share the same system?

The "Single Number" Problem We're Fixing

Features Currently In Progress