Audio - SambaNova Documentation

For developers requiring audio support, SambaNova provides OpenAI’s Whisper large-v3 model, which enables real-time transcriptions and translations.

Whisper-Large-v3

Model: Whisper-Large-v3
Description: State-of-the-art automatic speech recognition (ASR) and translation model. Developed by OpenAI and trained on 5M+ hours of labeled audio. Excels in multilingual and zero-shot speech tasks across diverse domains.
Model ID: Whisper-Large-v3
Supported languages: Multilingual

Parameter	Type	Description	Default	Endpoints
`model`	String	The ID of the model to use.	Required	`transcriptions`, `translations`
`file`	File	Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit: 25MB.	Required	`transcriptions`, `translations`
`prompt`	String	Prompt to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.”	Optional	`transcriptions`, `translations`
`response_format`	String	Output format: either `json` or `text`.	`json`	`transcriptions`, `translations`
`language`	String	The language of the input audio. Using ISO-639-1 format (e.g., `en`) improves accuracy and latency.	Optional	`transcriptions`, `translations`
`stream`	Boolean	Enables streaming responses.	`false`	`transcriptions`, `translations`
`stream_options`	Object	Additional streaming configuration (e.g., `{"include_usage": true}`).	Optional	`transcriptions`, `translations`