Audio
Transcription, realtime voice sessions, and org audio settings.
Audio routes power voice mode in the lab, file transcription, and org-level voice defaults. See Voice for product behavior and Telephony API for phone-call routing.
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /api/v1/audio/transcriptions | Transcribe uploaded or referenced audio to text. |
POST | /api/v1/audio/realtime-session | Create an OpenAI Realtime voice session for an agent turn. |
POST | /api/v1/audio/realtime-connect | Connect an existing realtime session (SIP/webhook follow-up). |
GET | /api/v1/settings/audio | Read org audio defaults (realtime model, voice catalog selections). |
PATCH | /api/v1/settings/audio | Update org audio settings. |
Realtime voice
POST /api/v1/audio/realtime-session returns the session material the browser voice UI needs to stream microphone input and play model audio back. Transcripts and tool calls are persisted into the regular chat session like text mode.
POST /api/v1/audio/realtime-connect completes provider-specific connect steps after session creation (for example SIP or webhook-backed telephony flows).
Transcription
POST /api/v1/audio/transcriptions converts audio files or recordings into text for chat attachments, file tools, and non-realtime flows.
Settings
Org audio settings apply defaults across agents unless an agent overrides its voice profile on the detail page. Read and update them through GET/PATCH /api/v1/settings/audio.