One API for every
AI provider.
Loom unifies AI providers behind a single interface with centralized routing, retries, caching, batching, observability, and cost optimization — while preserving every vendor's native capabilities.
Everything you need to run AI in production.
Twelve infrastructure primitives in one Python package. Production-ready since v1.0, semver-stable, observable end-to-end.
Unified AI API
One stable contract — generate(provider, model, prompt) — across every vendor and modality.
Native SDK preservation
Each vendor integrated with its native SDK. Prompt caching, grounding, streaming — preserved, not flattened.
Smart model routing
Cheap-first router with caller-supplied validators. Tries candidates in order, returns the first that passes.
Centralized retries
Exponential backoff with jitter, configurable per client. RetryPolicy(max_attempts=3, base_delay=0.5).
Vendor failover
Cross-vendor failover via a bundled equivalence map. When OpenAI is down, Anthropic answers instead.
Prompt caching
Vendor-native prompt caching for OpenAI, Anthropic, DeepSeek, and Gemini — discounts already applied in cost.
Batch processing
OpenAI and Anthropic batch endpoints behind a single submit_batch() with poll and wait primitives.
Request deduplication
Single-flight coalescing collapses identical concurrent calls into one upstream request.
Cost tracking
Every result carries cost.usd and cost.local computed from the catalog. Pricing tracked per model.
Unified observability
Structured INFO line per call: provider, model, latency, tokens, cost. SQLite sink + Flask dashboard.
API key centralization
AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault backends in the box. Keys live in one place.
OpenAI-compatible adapters
Register any OpenAI-shape vendor in ~10 lines with register_openai_compatible(key, base_url, env).
One control plane for every AI call.
Cost, latency, retries, cache hits, and provider health — captured for every request, surfaced in a single dashboard.
The path of a single generate() call.
Every request flows the same way — through cache, router, and adapter to the upstream vendor, then back with cost and a structured log. Optimization happens once, in the middle. New providers plug in without touching consuming code.
Upstream AI providers
Designed for engineers who ship.
One stable function call per workflow. Every code path returns the same response shape — text, usage, cost — across every vendor.
import loom
result = loom.generate(
provider="openai",
modality="text",
model="gpt-4o-mini",
prompt="Summarize this transcript in five bullets.",
)
print(result["text"])
print(result["cost"]["usd"]) # 0.0000038
print(result["usage"]["total_tokens"])
from fastapi import FastAPI
from loom import AsyncLoom
app = FastAPI()
aclient = AsyncLoom.from_env()
@app.get("/answer")
async def answer(q: str):
result = await aclient.generate(
provider="openai",
modality="text",
model="gpt-4o-mini",
prompt=q,
)
return {"text": result["text"], "cost": result["cost"]["usd"]}
from loom import Loom, Router, Candidate
router = Router(
candidates=[
("openai", "text", "gpt-4o-mini"),
Candidate("anthropic", "text", "claude-haiku-4-5"),
("openai", "text", "gpt-4o", {"temperature": 0.2}),
],
validator=lambda r: len(r["text"]) > 40,
)
client = Loom.from_env()
result = client.route(router, prompt="Explain quantum entanglement.")
result["_router"]["used"] # which model won
result["_router"]["tried"] # all attempted
from loom import Catalog
c = Catalog.from_yaml("models.yaml")
# Register a new OpenAI-shape vendor in ~10 lines:
c.register_openai_compatible(
key="newco",
label="NewCo AI",
base_url="https://api.newco.ai/v1",
api_key_env="NEWCO_API_KEY",
)
c.register_model(
provider="newco",
model_id="newco-large",
upstream_model="newco-large-2026-01",
input_cost_per_1m=2.50,
output_cost_per_1m=10.00,
)
from loom import Loom, BatchRequest
client = Loom.from_env()
handle = client.submit_batch([
BatchRequest(provider="openai", modality="text",
model="gpt-4o-mini",
prompt="summarize row 1", custom_id="row-1"),
BatchRequest(provider="openai", modality="text",
model="gpt-4o-mini",
prompt="summarize row 2", custom_id="row-2"),
])
print(handle.id, handle.status())
results = handle.wait(poll_interval=60.0, timeout=24 * 3600)
Built-in savings on every call.
Response cache, vendor prompt cache, smart routing, and batch APIs — orchestrated automatically. Targets validated in production.
Build AI infrastructure once.
Unify providers, optimize costs, and ship AI products faster. One install. Every vendor. Production from day one.