v1.0 · semver stable

One API for every
AI provider.

Loom unifies AI providers behind a single interface with centralized routing, retries, caching, batching, observability, and cost optimization — while preserving every vendor's native capabilities.

Python Sync + Async MIT
router · cheap-first
p95 · 312ms
01 ·metric
14+
AI providers
OpenAI, Anthropic, Gemini, xAI, Mistral, DeepSeek, BFL, Ideogram…
02 ·metric
1
Unified SDK
Single stable contract across text and image modalities.
03 ·metric
Smart
Routing engine
Cheap-first, validator-driven, with cross-vendor failover.
04 ·metric
100%
Observability
Cost, latency, tokens, retries — captured on every call.
Infrastructure

Everything you need to run AI in production.

Twelve infrastructure primitives in one Python package. Production-ready since v1.0, semver-stable, observable end-to-end.

01

Unified AI API

One stable contract — generate(provider, model, prompt) — across every vendor and modality.

02

Native SDK preservation

Each vendor integrated with its native SDK. Prompt caching, grounding, streaming — preserved, not flattened.

03

Smart model routing

Cheap-first router with caller-supplied validators. Tries candidates in order, returns the first that passes.

04

Centralized retries

Exponential backoff with jitter, configurable per client. RetryPolicy(max_attempts=3, base_delay=0.5).

05

Vendor failover

Cross-vendor failover via a bundled equivalence map. When OpenAI is down, Anthropic answers instead.

06

Prompt caching

Vendor-native prompt caching for OpenAI, Anthropic, DeepSeek, and Gemini — discounts already applied in cost.

07

Batch processing

OpenAI and Anthropic batch endpoints behind a single submit_batch() with poll and wait primitives.

08

Request deduplication

Single-flight coalescing collapses identical concurrent calls into one upstream request.

09

Cost tracking

Every result carries cost.usd and cost.local computed from the catalog. Pricing tracked per model.

10

Unified observability

Structured INFO line per call: provider, model, latency, tokens, cost. SQLite sink + Flask dashboard.

11

API key centralization

AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault backends in the box. Keys live in one place.

12

OpenAI-compatible adapters

Register any OpenAI-shape vendor in ~10 lines with register_openai_compatible(key, base_url, env).

Observability

One control plane for every AI call.

Cost, latency, retries, cache hits, and provider health — captured for every request, surfaced in a single dashboard.

loom.dev / observability
Requests / min
12,847+8.2%
P95 latency
312ms−14ms
Cache hit rate
61.4%+3.1%
Cost saved · 24h
$1,284+$92
Requests over time
12,847 / min · 6 providers active
throughput
Token usage · by provider
OpenAI
82%
Anthropic
64%
Gemini
47%
DeepSeek
38%
Mistral
26%
Provider health
OpenAI
370ms
Anthropic
280ms
Gemini
190ms
xAI Grok
480ms
Mistral
390ms
DeepSeek
300ms
MiniMax
210ms
Z.AI
500ms
Recent retries · failover
retry openai · 4292s ago
failover openai → anthropic14s ago
retry mistral · timeout1m ago
Live request log · structured INFO
tailing · 1.3s
--:--:--ok openaigpt-4o-mini120.0mscached$0.0000120
--:--:--ok anthropicclaude-haiku-4-5167.0ms$0.0000127
--:--:--retry geminigemini-2.5-flash214.0ms$0.0000134
--:--:--ok deepseekdeepseek-v3261.0mscached$0.0000141
--:--:--ok openaigpt-4o308.0ms$0.0000148
--:--:--ok mistralmistral-large355.0ms$0.0000155
Architecture

The path of a single generate() call.

Every request flows the same way — through cache, router, and adapter to the upstream vendor, then back with cost and a structured log. Optimization happens once, in the middle. New providers plug in without touching consuming code.

01
generate()
One contract — sync and async. The model resolves against the catalog, params merge over defaults, and a stable call key is derived for everything downstream.
02
Cache + dedup
A cache hit returns instantly. On a miss, single-flight dedup collapses identical in-flight calls onto one upstream request — the rest wait and share the result.
03
Router
Picks a candidate and validates the response. On error it fails over to the next provider in order — the caller never sees the retry seam.
04
Adapter
The chosen vendor's native SDK, wrapped in retry. Prompt caching, grounding, image polling, and streaming are preserved — not flattened to a lowest common denominator.
05
Response path
The result is enriched with usage and cost, written back to cache, and emitted as a structured log to observability — every call, every provider.
call keymisspickinvokehit → cachederror → failoverenrich · cost · structured loggenerate()resolve model · merge paramsCachelookup + single-flight dedupRoutercandidate selectionAdapternative SDK · wrapped in retryUpstream providerOpenAI · Anthropic · Gemini · …

Upstream AI providers

OpenAIAnthropicGeminixAI GrokMistralDeepSeekMiniMaxZ.AIBlack Forest LabsIdeogramByteDance SeedreamTencent Hunyuan
Developer experience

Designed for engineers who ship.

One stable function call per workflow. Every code path returns the same response shape — text, usage, cost — across every vendor.

loom.generategenerate()
app.pypython
import loom

result = loom.generate(
    provider="openai",
    modality="text",
    model="gpt-4o-mini",
    prompt="Summarize this transcript in five bullets.",
)

print(result["text"])
print(result["cost"]["usd"])      # 0.0000038
print(result["usage"]["total_tokens"])
loom.asyncasync
api.pypython
from fastapi import FastAPI
from loom import AsyncLoom

app = FastAPI()
aclient = AsyncLoom.from_env()

@app.get("/answer")
async def answer(q: str):
    result = await aclient.generate(
        provider="openai",
        modality="text",
        model="gpt-4o-mini",
        prompt=q,
    )
    return {"text": result["text"], "cost": result["cost"]["usd"]}
smart routing
router.pypython
from loom import Loom, Router, Candidate

router = Router(
    candidates=[
        ("openai",    "text", "gpt-4o-mini"),
        Candidate("anthropic", "text", "claude-haiku-4-5"),
        ("openai",    "text", "gpt-4o", {"temperature": 0.2}),
    ],
    validator=lambda r: len(r["text"]) > 40,
)

client = Loom.from_env()
result = client.route(router, prompt="Explain quantum entanglement.")

result["_router"]["used"]    # which model won
result["_router"]["tried"]   # all attempted
register provider
setup.pypython
from loom import Catalog

c = Catalog.from_yaml("models.yaml")

# Register a new OpenAI-shape vendor in ~10 lines:
c.register_openai_compatible(
    key="newco",
    label="NewCo AI",
    base_url="https://api.newco.ai/v1",
    api_key_env="NEWCO_API_KEY",
)

c.register_model(
    provider="newco",
    model_id="newco-large",
    upstream_model="newco-large-2026-01",
    input_cost_per_1m=2.50,
    output_cost_per_1m=10.00,
)
batch
batch.pypython
from loom import Loom, BatchRequest

client = Loom.from_env()

handle = client.submit_batch([
    BatchRequest(provider="openai", modality="text",
                 model="gpt-4o-mini",
                 prompt="summarize row 1", custom_id="row-1"),
    BatchRequest(provider="openai", modality="text",
                 model="gpt-4o-mini",
                 prompt="summarize row 2", custom_id="row-2"),
])

print(handle.id, handle.status())
results = handle.wait(poll_interval=60.0, timeout=24 * 3600)
Cost optimization

Built-in savings on every call.

Response cache, vendor prompt cache, smart routing, and batch APIs — orchestrated automatically. Targets validated in production.

Response cache
20–60%
Saved on workloads with repeated queries.
Prompt cache
50–90%
Discount on cached prefix tokens at the vendor.
Smart routing
50–80%
Saved on mixed workloads via cheap-first dispatch.
Batch API
~50%
Cheaper than real-time calls with a ~24h SLA.
Workload cost · 30 days
Optimization layer at work
Saved
−58.4%
BaselineWith Loom

Build AI infrastructure once.

Unify providers, optimize costs, and ship AI products faster. One install. Every vendor. Production from day one.

$ pip install loom-router