Tools & SDKs
The essential Python ecosystem for building production LLM applications.
- OpenAI SDK: Official Python client for GPT-4o and o-series models. Also works with any OpenAI-compatible endpoint (Ollama, vLLM, Groq).
- Anthropic SDK: Official client for Claude models. Includes streaming, tool use, vision, and document handling. Async support via
AsyncAnthropic. - Pydantic v2: Type validation and serialization. Define data shapes; Pydantic validates LLM outputs against them and provides clear error messages for retries.
- Instructor: Wraps any LLM SDK with automatic Pydantic validation and retry logic. Converts structured output from "hard" to "trivial."
- LangChain / LlamaIndex: Higher-level frameworks for chains, agents, RAG pipelines. Useful for prototyping; prefer direct SDK calls in production for control.
- vLLM: High-throughput inference server for open-source models. PagedAttention enables 20–30× higher throughput than naive batching.
Starter installation
pip install anthropic openai pydantic instructor tiktoken python-dotenv
Complete working example — fault classification pipeline
End-to-end: fault log → structured diagnosis
import os
from pydantic import BaseModel, Field
from typing import Literal
import instructor
import anthropic
class FaultReport(BaseModel):
fault_codes: list[str] = Field(description="OBD-II codes found")
bucket: Literal["Cylinder Issues", "Turbo", "Bad Sensor", "Unknown"]
root_cause: str = Field(description="Most likely root cause in one sentence")
confidence: float = Field(ge=0.0, le=1.0)
action: str = Field(description="Recommended next action")
reasoning: str = Field(description="Step-by-step diagnostic reasoning")
client = instructor.from_anthropic(anthropic.Anthropic())
SYSTEM = """You are an expert diesel engine fault analyst. Analyze fault logs step by step.
Always explain your reasoning before classifying. Use OBD-II knowledge and cross-reference symptoms."""
def diagnose(fault_log: str) -> FaultReport:
return client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=SYSTEM,
messages=[{"role": "user", "content": fault_log}],
response_model=FaultReport,
)
# Example usage
report = diagnose("""
P0303: Cylinder 3 misfire (count: 142 in last 10min)
Exhaust temp sensor 3: +47°C above baseline
Fuel trim bank 1: +9.2% (long term)
Engine hours: 8,423
Last service: 7,900h (spark plugs replaced)
""")
print(f"Bucket: {report.bucket}")
print(f"Confidence: {report.confidence:.0%}")
print(f"Root cause: {report.root_cause}")
print(f"Action: {report.action}")
Next steps
Now that you have the foundations: explore agentic patterns (tool use, multi-agent), RAG for knowledge augmentation, and evals for measuring prompt quality systematically. The real leverage is in measuring your prompts, not just writing them.