Audio Mode | Saperly Docs

Audio mode is for teams that want raw audio, not text turn-taking.

You use this when your agent stack already speaks WebSocket and you care about voice-native control.

Audio mode costs $0.13/min for Zone A (US/Canada), same as webhook mode. You pay for telephony only. International destinations use Zone B (×2) and Zone C (×3) — see Voice zones.

When to use it

OpenAI Realtime
custom ASR or TTS
interruption handling under your control
low-latency voice orchestration

Create an audio line

$ curl -X POST https://saperly.com/api/v1/lines \
>   -H "Authorization: Bearer sk_live_..." \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Realtime voice line",
>     "mode": "audio",
>     "audio_handler_url": "wss://your-app.com/voice",
>     "status_callback_url": "https://your-app.com/status"
>   }'

Connection flow

an inbound or outbound call starts
Saperly gives your system a relay URL
your system connects over WebSocket
audio frames move in both directions

Relay messages you receive

1 {
2   "type": "call_started",
3   "call_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
4 }

1 {
2   "type": "audio",
3   "payload": "<base64-encoded-audio>",
4   "timestamp": "1711900005000"
5 }

1 {
2   "type": "call_ended",
3   "call_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
4   "duration_sec": 45
5 }

Messages you send back

1 {
2   "type": "audio",
3   "payload": "<base64-encoded-audio>"
4 }

Audio formats

Saperly can bridge between carrier audio and the format your agent stack wants. Use a format query parameter on the relay URL to select the codec.

Format	Sample rate	Bit depth	Frame size (20ms)	Use case
`mulaw_8k`	8kHz	8-bit	160 bytes	Carrier native, lowest bandwidth
`pcm16_8k`	8kHz	16-bit	320 bytes	Basic ASR
`pcm16_16k`	16kHz	16-bit	640 bytes	Most modern ASR/TTS (recommended)
`pcm16_24k`	24kHz	16-bit	960 bytes	High-quality TTS, OpenAI Realtime

pcm16_16k is usually the sane default when connecting to modern realtime models. Use pcm16_24k when your downstream model expects it (OpenAI Realtime), and mulaw_8k when you want to stay closest to the carrier stream.

TypeScript WebSocket client

A minimal handler that decodes inbound frames, processes them, and writes audio back.

1 import WebSocket from 'ws';
2 
3 function handleAudioRelay(relayUrl: string) {
4   const ws = new WebSocket(relayUrl);
5 
6   ws.on('message', (data) => {
7     const msg = JSON.parse(data.toString());
8 
9     switch (msg.type) {
10       case 'call_started':
11         console.log(`Call ${msg.call_id} started`);
12         break;
13       case 'audio':
14         // msg.payload is base64-encoded audio
15         const audioBuffer = Buffer.from(msg.payload, 'base64');
16         // Feed to your ASR/model pipeline
17         processAudio(audioBuffer);
18         break;
19       case 'call_ended':
20         console.log(`Call ended after ${msg.duration_sec}s`);
21         ws.close();
22         break;
23     }
24   });
25 
26   // Send audio back
27   function sendAudio(audioBuffer: Buffer) {
28     ws.send(JSON.stringify({
29       type: 'audio',
30       payload: audioBuffer.toString('base64'),
31     }));
32   }
33 }

Rate limiting and backpressure

Audio frames arrive at the call’s sample rate (e.g., 50 frames/sec at 20ms per frame for pcm16_16k). If your handler cannot keep up, frames are dropped. Monitor your WebSocket buffer size and processing latency.

Practical guidance:

Keep per-frame processing under the frame interval (20ms). Offload heavy work to a queue.
Watch ws.bufferedAmount on your outbound socket. If it climbs, your sender is faster than the network.
Prefer small frames and steady cadence over large chunks sent in bursts.

Integration: OpenAI Realtime API

Create an audio line with pcm16_24k

Set audio_handler_url to your WebSocket server and use ?format=pcm16_24k on the relay URL.

On call_started, open a session to OpenAI Realtime

Forward audio frames from Saperly to the OpenAI Realtime session.

Forward OpenAI audio responses back to Saperly

Send base64-encoded audio frames back through the Saperly relay.

Troubleshooting

Symptom	Likely cause	Fix
No audio received	Wrong relay URL format	Check `?format=` parameter
Garbled audio	Codec mismatch	Ensure sender and receiver use same format
High latency	Processing bottleneck	Profile your audio pipeline, check buffer sizes
Dropped frames	Backpressure	Increase consumer throughput or buffer
Connection drops	Timeout	Send keepalive pings every 30 seconds

Build advice

Do not start here unless you already know why webhook mode is insufficient.

Audio mode is powerful, but it has more failure surfaces:

dropped frames
timing drift
backpressure
interruption logic
codec mismatches

If your product does not need that level of control, start with hosted mode or webhook mode instead.