Salesforce Voice Stack Explained: Open CTI, Salesforce Voice, and Agentforce
An architectural guide for IT & operations leaders on navigating the layers of Salesforce telephony, understanding where AI fits in, and separating the marketing story from carrier-grade reality.
Introduction
If you have spent any time researching the Salesforce voice ecosystem, you have likely encountered a cluster of overlapping product names without a clear explanation of how they relate. Open CTI, Service Cloud Voice, Salesforce Voice, Agentforce Contact Center: each one represents a different layer of the same stack, but they are rarely presented that way.
The confusion is understandable. Does adopting Agentforce mean you no longer need a telephony provider? Is Open CTI being replaced? Where does Salesforce Voice actually fit? These names appear constantly in vendor and partner literature without a clear picture of how they connect, or whether they connect at all.
This guide is a plain-language explanation of how these pieces fit together. Telephony is not a single software product. It is a stack of distinct layers, each one doing a different job. When you understand the layers, the picture resolves. And as we will explore, Natterbox sits at the foundation of this stack, providing the carrier-grade voice infrastructure that makes everything above it perform.
What Are Telephony Layers and What Do They Mean?
Before getting into Salesforce’s product naming, it is worth understanding what actually happens when a phone call is made — not the abstract version, but the mechanical one.
Think of a phone call as a live postal service. When you dial a number, the network has two jobs to do simultaneously: set up the delivery channel, and then keep letters flowing in both directions in real time. Two separate protocols handle these two separate jobs.
Session Initiation Protocol (SIP) handles the logistics of the connection. It rings the other party, agrees the terms of the call, and formally ends it when you hang up. Think of it as the postal address, and the knock on the door.
Real-Time Transport Protocol (RTP) handles the content once the door is open. It carries the continuous stream of audio packets flowing in both directions, fast enough that neither party notices any delay. Think of it as the letters themselves, delivered and read in the same instant they are written.
How a voice AI call is built, layer by layer
That last layer is worth dwelling on. There is nothing “LLM” in telephony itself. An LLM does not route packets, negotiate SIP handshakes, or maintain sub-500ms audio latency. It is a text-processing model that sits several layers above the physical network and never directly touches the call. The LLM’s output is only as good as the transcription it receives, and the transcription is only as good as the audio beneath it.
If the underlying infrastructure is weak (packet loss, latency, or jitter), the translation layer receives degraded audio and feeds the LLM garbled text. Garbage in, garbage out. Slow is smooth, and smooth is fast: carrier-grade voice infrastructure is the non-negotiable foundation for any AI automation strategy built on voice.
Understanding the Salesforce Stack
Most of the market confusion comes from Salesforce’s naming history rather than any genuine architectural complexity. Service Cloud Voice recently changed its name to Salesforce Voice. Meanwhile, Open CTI — Salesforce’s open developer framework for embedding telephony into the Salesforce UI — continues to serve as an integration path for third-party providers, though Salesforce has signalled a migration towards the more tightly integrated Salesforce Voice architecture by 2028. Sitting at the top of Salesforce’s current product push is Agentforce Contact Center, the autonomous AI layer.
Three different names, three different jobs, none of them interchangeable:
| Stack Layer | Salesforce Product | Technical Role | Infrastructure Owner | Natterbox Relationship |
|---|---|---|---|---|
| Cognitive AI | Agentforce Contact Center | Autonomous reasoning, intent classification, workflow automation | Salesforce (Software) | Natterbox integrates natively; acts as intelligent router, first-line AI, and observability layer |
| UI Unification | Salesforce Voice (formerly Service Cloud Voice) | Unifies voice calls, transcripts, and CRM data inside the agent console | Salesforce / Carrier | Natterbox is a certified Bring Your Own Telephony (BYOT) partner |
| Open Integration | Open CTI | Open developer API that embeds third-party telephony into Salesforce without browser plugins | Telephony Provider | Natterbox supports Open CTI; migrating clients to Salesforce Voice architecture ahead of 2028 |
What Each Telephony Layer Actually Does
Knowing precisely where each layer’s job starts and ends is what allows you to design a resilient architecture, and avoid assuming one layer’s responsibilities belong to another.
Open CTI
Open CTI is Salesforce’s open developer framework, a publicly documented JavaScript API that allows telephony providers to embed their dialler interface directly into Salesforce without requiring browser plugins. It is open in the sense that it is freely available to any provider to build against; third-party vendors like Natterbox use it to surface call controls, contact data, and activity logging natively inside the Salesforce UI.
What Open CTI does not do is carry calls, manage compliance, or own any part of the network. It is an integration surface. The telephony infrastructure it connects to (carrier relationships, call quality, recording, routing) remains the responsibility of the provider. With Salesforce consolidating its voice architecture around the Salesforce Voice model by 2028, organisations using Open CTI today should be planning migration to ensure continuity of real-time AI capabilities that depend on direct audio stream access.
Salesforce Voice (formerly Service Cloud Voice)
Salesforce Voice is the unification layer. Rather than running a separate browser frame, it brings voice directly into Salesforce’s native Omni-Channel engine, so calls are routable, loggable, and tied to the customer record in real time, sitting alongside every other channel in the agent’s workspace.
Salesforce Voice runs on a carrier underneath it. The default is Amazon Connect, a capable option within AWS infrastructure, though one that constrains flexibility around carrier choice, global routing, compliance customisation, and call quality ownership. For organisations with specific requirements around data residency, international number provisioning, or regulated recording obligations, a certified Bring Your Own Telephony partner gives considerably more control over how the infrastructure beneath Salesforce Voice is configured, managed, and held to account.
Agentforce Contact Center
Agentforce is the cognitive AI layer. It brings autonomous reasoning, intent classification, and Salesforce workflow automation to the contact center. During a live call it can surface relevant knowledge and account context; when a conversation ends, it can generate a structured summary and write it directly to the Salesforce record. It classifies intent from natural language and integrates natively with Salesforce data objects throughout.
Agentforce is software built on top of a contact center stack. It is not the stack itself. It cannot provision a phone number, manage a carrier relationship, or satisfy a compliance recording obligation at the infrastructure level. For a deeper look at how these two layers interact in practice, see our guide to Agentforce Contact Center and voice infrastructure.
Clearing Up the Confusion
The conflation of these three products is partly a marketing problem. Salesforce sells them within the same ecosystem, with each product update expanding capabilities in a way that blurs the layer boundaries. The naming changes don’t help. And the product names suggest a completeness that the underlying architecture does not deliver on its own.
The core misconception is that adopting Agentforce Contact Center means you no longer need a dedicated telephony provider. This is precisely what causes voice architectures to break down.
Agentforce is a capable software engine. It is not a telecom carrier. It does not provision numbers, manage carrier relationships, own call quality, or satisfy compliance obligations at the network level. Remove the telephony provider and Agentforce has no voice to process, and no ears to hear it with.
Infrastructure That Cannot Be Replaced
Regardless of which Salesforce layer you deploy, the need for carrier-grade voice infrastructure remains constant. Software cannot replace the physical reality of telecom operations.
Global Voice Infrastructure and Call Quality
Call quality, latency, jitter, and packet loss are infrastructure problems. Agentforce cannot resolve a degraded call. Salesforce Voice cannot provision a local number in a new market. Open CTI cannot manage long-distance carrier relationships. These are network-layer concerns that require a provider with genuine physical presence in the global telecom network, managing regional Points of Presence, carrier relationships, and end-to-end routing to keep audio clean and connections stable.
Number Provisioning and Carrier Management
Enterprises operating across multiple countries need local number provisioning, Caller Line Identification (CLI) management, number porting, and carrier failover. If a regional carrier goes down, traffic must automatically reroute. None of this is a CRM operation. It requires direct integration with the global telecom network, and it has to work before a single audio packet reaches your Salesforce environment.
Compliance at the Infrastructure Level
Highly regulated industries have obligations that extend far beyond what a CRM or AI layer can satisfy. Many must be enforced at the telephony infrastructure level, before audio touches any application.
In the UK and EU:
- FCA & MiFID II call recording: All relevant communications must be captured, retained for a minimum of five years, and be retrievable on demand for regulatory review. The obligation applies to the conversation itself, which means the telephony infrastructure must record it at source.
- GDPR & data residency: Audio files and transcripts must be stored in compliance with strict regional data sovereignty rules. Where a customer’s data is held, and under which jurisdiction, is an infrastructure decision, not a CRM configuration.
In the United States:
- TCPA (Telephone Consumer Protection Act): Governs automated calling and texting, including strict rules around prior express written consent for outbound communications and defined calling windows. Violations carry statutory damages per call. Compliance controls (Do Not Call scrubbing, consent verification, frequency caps) must be enforced at the telephony infrastructure level.
- SEC & FINRA call recording requirements: Broker-dealers and investment advisers are required to record and retain all business communications, including telephone calls. Under FINRA Rule 3110, firms must have supervisory systems in place, a requirement that sits at the telephony infrastructure level.
- Dodd-Frank Act: Imposes recording and retention requirements on swap dealers and major swap participants, covering pre-trade and post-trade voice communications for firms operating in derivatives markets.
- State-level recording consent laws: Federal law requires single-party consent for call recording, but a number of US states (including California, Florida, Illinois, and Washington) require all-party consent. Without active compliance announcements at the telephony infrastructure level, organisations making or receiving calls across state lines face meaningful legal exposure.
- HIPAA (Health Insurance Portability and Accountability Act): For healthcare and health-adjacent financial services firms, any voice conversation touching patient data must be recorded, stored, and transmitted in a HIPAA-compliant environment, a designation that applies to the telephony infrastructure, not only the application layer above it.
Across both regions:
- PCI DSS scope management: Payment Card Industry Data Security Standards require that cardholder data (including card numbers spoken on a call) is never exposed to systems outside the defined secure perimeter. Secure DTMF capture and digital payment links are telephony infrastructure capabilities that reduce PCI scope for the entire CRM and AI environment built on top.
The common thread: the compliance obligation attaches to the conversation, not to the software processing it. A CRM cannot retroactively make a call compliant. The infrastructure has to be designed for it from the first packet.
How the Stack Stacks Up
The question is never “which one of these do I choose?” but “how do these layers work together?” Natterbox integrates natively with all three Salesforce products, and the role it plays in this stack goes well beyond carrying audio from one end to the other.
The first voice on the line
The difference between a conventional IVR and what Natterbox does is that routing does not start with a menu. When a call connects — any call, at any hour of any day — a Natterbox AI Voice Agent answers first. It has a real conversation: it asks what the caller needs, listens in natural language, cross-references the Salesforce record in real time, and makes a routing decision based on actual intent rather than whichever button the caller pressed. The AI is available 24 hours a day, 365 days a year, handling every call that lands regardless of business hours, capacity, or queue depth.
Based on that conversational exchange, Natterbox routes to wherever makes most sense:
- A human agent via Salesforce Omni-Channel, with the caller’s full CRM context surfaced before the agent picks up.
- Continued AI handling, where the Natterbox AI Voice Agent resolves the query directly by drawing on ingested knowledge bases, website content, and Salesforce data objects, with guardrails active throughout.
- An Agentforce Contact Center workflow, with full conversation context passed across so the AI never has to start from scratch.
The routing logic is CRM-aware throughout: Natterbox draws on live Salesforce data to apply priority handling for high-value customers, match callers to agents with the right skills, and respect overflow rules across every scenario.
Observability: what you can see when the system is running
One of the less visible, but practically significant, advantages of infrastructure-level ownership is observability. When the telephony layer is native to Salesforce, every call leaves a trail you can inspect in detail: not in aggregate, but conversation by conversation, decision by decision.
Live supervision. Supervisors can listen to any active call without the agent or caller knowing (silent monitoring), push text guidance to the agent’s screen mid-call (whisper coaching), or speak directly to the agent without the caller hearing. When a call needs direct intervention, supervisors can join it. The customer’s experience is uninterrupted throughout.
Floor-level visibility. All active calls, queues, and agent states are visible in real time from the Salesforce Omni-Channel Supervisor tab. Floor managers get the same live picture regardless of whether agents are handling voice, chat, or any other channel.
AI transparency. For any AI-handled interaction, supervisors can inspect step by step what the AI understood, what decision it made, and why. Natterbox’s conversation debugging tools make that level of scrutiny straightforward. In regulated environments this matters considerably: you are not auditing an outcome, you are auditing the reasoning that produced it. That distinction tends to matter when a regulator asks questions.
Quality intelligence. Rather than sampling 1-2% of calls for quality review, Natterbox’s AI quality scoring evaluates interactions without requiring survey responses from customers. Every call gets assessed. Self-service deflection measurement tracks AI resolution rate against human escalation rate automatically, giving operations leaders a clear, data-driven picture of where automation is working and where it is not.
Post-call intelligence. After every call, AI generates a structured summary and writes it directly to the relevant Salesforce record. Wrapup fields are completed automatically. Because Natterbox is built as a Salesforce-native managed package, all call data, AI decision logs, and interaction summaries are written to native Salesforce reporting objects, giving compliance, operations, and leadership a single, unified audit trail rather than a parallel system to reconcile.
| Layer | What it does | Owner |
|---|---|---|
| Network Transport | SIP/RTP: establishes connections and carries live audio packets | Natterbox |
| Global Carrier & Numbers | Local number provisioning, CLI management, porting, and carrier failover across regions | Natterbox |
| Compliance Recording | FCA, MiFID II, FINRA, TCPA, Dodd-Frank, PCI DSS, HIPAA, and state-level consent obligations enforced at the telephony infrastructure level | Natterbox |
| Speech Translation (STT/TTS) | Converts live audio to text for AI processing; converts AI responses back to natural speech | Natterbox |
| 24/7 AI Voice Agents | Answers every inbound call conversationally at any time, gauges intent in natural language, and queries the Salesforce record before any routing decision is made | Natterbox |
| Knowledge Base & Resolution | AI resolves queries by drawing on ingested documents, website content, and native Salesforce Knowledge, with configurable guardrail profiles and persona configuration active throughout | Natterbox |
| Intelligent Routing | AI-first, CRM-aware, skills-based, priority, and dynamic routing, including AI agents as first-class routing destinations alongside human queues | Natterbox |
| Human-AI Handoff | Bot-to-human and agent-to-agent escalation with full conversation context preserved; the caller perceives one continuous interaction | Natterbox |
| External System Actions | AI retrieves data from and executes actions in third-party systems during live conversations, including payment processors, order management, and external databases | Natterbox |
| Post-call Intelligence | AI-generated call summaries and automated wrapup fields written back to Salesforce after every interaction | Natterbox |
| Live Supervision & Coaching | Silent monitoring, whisper coaching (text and audio), barge, and Omni-Channel Supervisor visibility across all active calls and queues | Natterbox |
| AI Observability | Conversation debugging, AI quality scoring across 100% of interactions, and deflection measurement, all surfaced in native Salesforce reporting | Natterbox |
| UI Unification | Agent console, Omni-Channel presence, and real-time transcripts unified inside Salesforce | Salesforce Voice |
| Workflow AI & Automation | Autonomous reasoning, Einstein intent classification, and native Salesforce workflow automation | Agentforce Contact Center |
| CRM Records & Reporting | Customer data objects, activity logs, and native Salesforce reporting, the system of record for all interaction data | Salesforce |
Before You Decide: A Decision-Making Framework
The right questions when evaluating a voice architecture are not about features. They are about accountability. These four tend to locate the gaps faster than any vendor demo. For a more complete evaluation framework built around Salesforce specifically, our AI contact center evaluation guide covers each of these considerations in greater depth.
Do you operate across multiple countries and need local numbers in each market?
If your customer base is global, a single-region cloud telephony setup will not cover it. You need a provider with a global network of regional Points of Presence to keep latency below the threshold where calls start to feel wrong, and the regulatory expertise to provision and port local numbers across jurisdictions, each with its own rules about what a number can and cannot do.
What are your compliance recording obligations, and who is accountable for them at the infrastructure level?
Legal teams know what needs to be recorded and retained. The question is whether the telephony provider has contractually committed to delivering it at the telephony infrastructure level, rather than just offering a setting in a dashboard. In regulated industries, the distance between those two things is where audits get complicated.
When call quality degrades, who do you call?
If an agent’s call is dropping or choppy, Salesforce cannot troubleshoot the carrier network. You need a telephony partner who owns the network layer end-to-end and can provide real-time diagnostics. If the answer involves escalating to a carrier, who escalates to their upstream provider, you are already two handoffs from resolution.
Are your AI automation goals contingent on having a reliable voice infrastructure underneath them?
Always. An AI agent is only as capable as the transcription it receives, and the transcription is only as accurate as the audio beneath it. Investing in Agentforce Contact Center without validating the network layer is a straightforward way to spend a significant budget on a system that underperforms, and have the failure attributed to AI when the real problem is infrastructure.
Open CTI vs. Salesforce Voice vs. Agentforce: What Do You Really Need?
To return to the central point: these are not three competing products. They are three layers of a single, modern voice stack, each one doing a distinct job that the other two cannot.
You do not need to choose between Salesforce Voice and Agentforce. You can use Salesforce Voice to unify the agent workspace and Agentforce to automate routine workflows. The prerequisite is that both sit on carrier-grade voice infrastructure that handles the physical call, satisfies the compliance obligation, and connects the conversation to the right destination: a human advisor, a Natterbox AI Voice Agent resolving the query directly, or a broader Agentforce workflow.
If you are still mapping the broader landscape of Salesforce contact center options, our guide to the best Salesforce contact center solutions is a useful companion to this article.
Natterbox is not a Salesforce competitor. We are the voice infrastructure that makes Salesforce perform. If you are currently mapping your voice architecture and want to understand how these layers fit together in your specific environment, our team is happy to walk through it.