Skip to content

Search docs

Find pages, headings, and concepts. Press ⌘K or Ctrl+K to toggle.

Audio

Transcription, realtime voice sessions, and org audio settings.

Audio routes power voice mode in the lab, file transcription, and org-level voice defaults. See Voice for product behavior and Telephony API for phone-call routing.

Endpoints

MethodPathDescription
POST/api/v1/audio/transcriptionsTranscribe uploaded or referenced audio to text.
POST/api/v1/audio/realtime-sessionCreate an OpenAI Realtime voice session for an agent turn.
POST/api/v1/audio/realtime-connectConnect an existing realtime session (SIP/webhook follow-up).
GET/api/v1/settings/audioRead org audio defaults (realtime model, voice catalog selections).
PATCH/api/v1/settings/audioUpdate org audio settings.

Realtime voice

POST /api/v1/audio/realtime-session returns the session material the browser voice UI needs to stream microphone input and play model audio back. Transcripts and tool calls are persisted into the regular chat session like text mode.

POST /api/v1/audio/realtime-connect completes provider-specific connect steps after session creation (for example SIP or webhook-backed telephony flows).

Transcription

POST /api/v1/audio/transcriptions converts audio files or recordings into text for chat attachments, file tools, and non-realtime flows.

Settings

Org audio settings apply defaults across agents unless an agent overrides its voice profile on the detail page. Read and update them through GET/PATCH /api/v1/settings/audio.