Models & API Overview
The landscape of production-ready models available today — proprietary and open-source.
Proprietary Frontier Models
| Model | Provider | Context | Key Strengths |
|---|---|---|---|
| GPT-4o | OpenAI | 128K | Vision, function calling, broad capability |
| Claude Sonnet 4.5 | Anthropic | 200K | Long context, reasoning, code, instruction following |
| Gemini 1.5 Pro | 1M+ | Massive context window, native video understanding |
Best For:
- GPT-4o: Production apps, multimodal pipelines.
- Claude Sonnet 4.5: Document analysis, agentic tasks, safety-critical applications.
- Gemini 1.5 Pro: Full codebase analysis, long video QA.
Open-source Models via Ollama / vLLM
- Llama 3 8B (Meta): Ideal for self-hosting on consumer hardware.
- Llama 3 70B (Meta): High performance requiring a GPU cluster.
- Mistral 7B (Mistral AI): Highly efficient for its size.
Spin up Llama 3 locally with Ollama
# Install Ollama
curl -fsSL [https://ollama.ai/install.sh](https://ollama.ai/install.sh) | sh
# Pull and run Llama 3 8B
ollama pull llama3:8b
ollama run llama3:8b
# Or use the OpenAI-compatible API
ollama serve # starts on http://localhost:11434
Use Ollama with OpenAI-compatible SDK
from openai import OpenAI
# Point to local Ollama server
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
response = client.chat.completions.create(
model="llama3:8b",
messages=[{
"role": "user",
"content": "Explain attention in one paragraph."
}]
)
print(response.choices[0].message.content)