SambaStack models

SambaStack supports a variety of models that can be deployed to both on-premises and hosted environments. Contact your system administrator to determine which models are available on your deployment. You can also use the Model list API command to view which models are deployed and available for your use.

Deployment options

When deploying models in SambaStack, administrators can select from various context length and batch size combinations.

Smaller batch sizes provide higher token throughput (tokens/second).
Larger batch sizes provide better concurrency for multiple users.

Supported models

The table below lists supported models, context lengths, batch sizes, and features.

Developer/Model ID	Type	Context length (batch size)	Features and optimizations	View on Hugging Face
Meta
`Meta-Llama-3.3-70B-Instruct`	Text	View 4K (1,2,4,8,16,32) 8K (1,2,4,8) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: Yes Optimizations: Speculative decoding	Model card
`Meta-Llama-3.1-8B-Instruct`	Text	View 4K (1,2,4,8) 8K (1,2,4,8) 16K (1,2,4)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: Yes Optimizations: None	Model card
`Llama-4-Maverick-17B-128E-Instruct`	Image, Text	View 4K (1,4) 8K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
DeepSeek
`DeepSeek-R1-0528`	Reasoning, Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`DeepSeek-R1-Distill-Llama-70B`	Reasoning, Text	View 4K (1,2,4,8,16,32) 8K (1,2,4,8) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: Yes Optimizations: Speculative decoding	Model card
`DeepSeek-V3-0324`	Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`DeepSeek-V3.1`	Reasoning, Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
OpenAI
`Whisper-Large-v3`	Audio	View 4K (1,16,32)	View Endpoint: Translation, Transcription Capabilities: None Import checkpoint: No Optimizations: None	Model card
Qwen
`Qwen3-32B`	Reasoning, Text	View 8K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: None	Model card
Tokyotech-llm
`Llama-3.3-Swallow-70B-Instruct-v0.4`	Text	View 4K (1,2,4,8,16) 8K (1,2,4,8,16) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: Speculative decoding	Model card
Other
`E5-Mistral-7B-Instruct`	Embedding	View 4K (1,2,4,8,16,32)	View Endpoint: Embeddings Capabilities: None Import checkpoint: No Optimizations: None	Model card

Get started

Models

Features

Build

Resources

Deployment options

Supported models

Get started

Models

Features

Build

Resources

Documentation Index

​Deployment options

​Supported models

Deployment options

Supported models