Voice Implementation Notes
These are the lessons that tend to matter most when you move from a prototype to real phone calls.
- Twilio strips query parameters from WebSocket stream URLs. If you need per-call routing, encode it in the path like
/voice/ws/reminder or /voice/ws/followup, then pass the parsed value into your Durable Object with headers.
- TwiML is XML, so any URL you interpolate into it must be escaped correctly. Unescaped special characters can break the TwiML response.
- Twilio sends
connected and start separately. The start event carries streamSid, and you need that before you can send audio back.
- OpenAI can begin returning audio before Twilio has finished startup. Buffer outbound audio until the
start event arrives, then flush it.
- Attach Twilio WebSocket listeners before doing database work or other async setup. Early
connected or start frames are easy to miss if you wait too long.
- Avoid using
waitUntil() as the main wrapper around the live WebSocket session. Let the WebSocket lifecycle keep the Durable Object active, and reserve background work for side effects like logging.
- Native function calling is much more reliable than trying to regex-parse spoken confirmations after the call.
- Keep the prompt focused on the conversation goal, then put exact field requirements into the tool schema.
- If you need a very stable opening, reinforce it with the first-turn
response.create instruction in addition to the session prompt.
- Keep a raw ordered transcript plus the structured tool payload. That combination makes debugging much easier than relying on only one or the other.
Realtime Tuning
- Phone audio usually needs server VAD tuning. Values around
0.55 to 0.65 for threshold and roughly 400 to 650 ms for silence_duration_ms are good starting points, but you should test against your own callers and background noise.
- If the assistant is too eager, it will interrupt callers or react to noise. If it is too conservative, it will feel sluggish and miss short replies.
- Model upgrades can change behavior. Re-test greeting stability, interruption behavior, and tool timing when you change Realtime models.
Security And Logging
- Validate
X-Twilio-Signature on both the inbound webhook and the WebSocket upgrade request.
- Persist key milestones like webhook accepted, stream started, first audio sent, tool called, and call closed.
- Keep transcript and debug logs separate from the final call record so you can inspect failures without rewriting the main table shape.