v1.0 · semver stable

One API for every
AI provider.

Loom unifies AI providers behind a single interface with centralized routing, retries, caching, batching, observability, and cost optimization — while preserving every vendor's native capabilities.

View GitHub Documentation

Python Sync + Async MIT

router · cheap-first

p95 · 312ms

01 ·metric

14+

AI providers

OpenAI, Anthropic, Gemini, xAI, Mistral, DeepSeek, BFL, Ideogram…

02 ·metric

Unified SDK

Single stable contract across text and image modalities.

03 ·metric

Smart

Routing engine

Cheap-first, validator-driven, with cross-vendor failover.

04 ·metric

100%

Observability

Cost, latency, tokens, retries — captured on every call.

Infrastructure

Everything you need to run AI in production.

Twelve infrastructure primitives in one Python package. Production-ready since v1.0, semver-stable, observable end-to-end.

Unified AI API

One stable contract — generate(provider, model, prompt) — across every vendor and modality.

Native SDK preservation

Each vendor integrated with its native SDK. Prompt caching, grounding, streaming — preserved, not flattened.

Smart model routing

Cheap-first router with caller-supplied validators. Tries candidates in order, returns the first that passes.

Centralized retries

Exponential backoff with jitter, configurable per client. RetryPolicy(max_attempts=3, base_delay=0.5).

Vendor failover

Cross-vendor failover via a bundled equivalence map. When OpenAI is down, Anthropic answers instead.

Prompt caching

Vendor-native prompt caching for OpenAI, Anthropic, DeepSeek, and Gemini — discounts already applied in cost.

Batch processing

OpenAI and Anthropic batch endpoints behind a single submit_batch() with poll and wait primitives.

Request deduplication

Single-flight coalescing collapses identical concurrent calls into one upstream request.

Cost tracking

Every result carries cost.usd and cost.local computed from the catalog. Pricing tracked per model.

Unified observability

Structured INFO line per call: provider, model, latency, tokens, cost. SQLite sink + Flask dashboard.

API key centralization

AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault backends in the box. Keys live in one place.

OpenAI-compatible adapters

Observability

One control plane for every AI call.

Cost, latency, retries, cache hits, and provider health — captured for every request, surfaced in a single dashboard.

loom.dev / observability

last 24hall providers

Requests / min

12,847+8.2%

P95 latency

312ms−14ms

Cache hit rate

61.4%+3.1%

Cost saved · 24h

$1,284+$92

Requests over time

12,847 / min · 6 providers active

throughput

Token usage · by provider

OpenAI

82%

Anthropic

64%

Gemini

47%

DeepSeek

38%

Mistral

26%

Provider health

OpenAI

370ms

Anthropic

280ms

Gemini

190ms

xAI Grok

480ms

Mistral

390ms

DeepSeek

300ms

MiniMax

210ms

Z.AI

500ms

Recent retries · failover

retry openai · 4292s ago

failover openai → anthropic14s ago

retry mistral · timeout1m ago

Live request log · structured INFO

tailing · 1.3s

--:--:--ok openaigpt-4o-mini120.0mscached$0.0000120

--:--:--ok anthropicclaude-haiku-4-5167.0ms$0.0000127

--:--:--retry geminigemini-2.5-flash214.0ms$0.0000134

--:--:--ok deepseekdeepseek-v3261.0mscached$0.0000141

--:--:--ok openaigpt-4o308.0ms$0.0000148

--:--:--ok mistralmistral-large355.0ms$0.0000155

Architecture

The path of a single generate() call.

Every request flows the same way — through cache, router, and adapter to the upstream vendor, then back with cost and a structured log. Optimization happens once, in the middle. New providers plug in without touching consuming code.

generate()

One contract — sync and async. The model resolves against the catalog, params merge over defaults, and a stable call key is derived for everything downstream.

Cache + dedup

A cache hit returns instantly. On a miss, single-flight dedup collapses identical in-flight calls onto one upstream request — the rest wait and share the result.

Router

Picks a candidate and validates the response. On error it fails over to the next provider in order — the caller never sees the retry seam.

Adapter

The chosen vendor's native SDK, wrapped in retry. Prompt caching, grounding, image polling, and streaming are preserved — not flattened to a lowest common denominator.

Response path

The result is enriched with usage and cost, written back to cache, and emitted as a structured log to observability — every call, every provider.

Upstream AI providers

OpenAIAnthropicGeminixAI GrokMistralDeepSeekMiniMaxZ.AIBlack Forest LabsIdeogramByteDance SeedreamTencent Hunyuan

Developer experience

Designed for engineers who ship.

One stable function call per workflow. Every code path returns the same response shape — text, usage, cost — across every vendor.

loom.generategenerate()

app.pypython

import loom

result = loom.generate(
    provider="openai",
    modality="text",
    model="gpt-4o-mini",
    prompt="Summarize this transcript in five bullets.",
)

print(result["text"])
print(result["cost"]["usd"])      # 0.0000038
print(result["usage"]["total_tokens"])

loom.asyncasync

api.pypython

from fastapi import FastAPI
from loom import AsyncLoom

app = FastAPI()
aclient = AsyncLoom.from_env()

@app.get("/answer")
async def answer(q: str):
    result = await aclient.generate(
        provider="openai",
        modality="text",
        model="gpt-4o-mini",
        prompt=q,
    )
    return {"text": result["text"], "cost": result["cost"]["usd"]}

smart routing

router.pypython

from loom import Loom, Router, Candidate

router = Router(
    candidates=[
        ("openai",    "text", "gpt-4o-mini"),
        Candidate("anthropic", "text", "claude-haiku-4-5"),
        ("openai",    "text", "gpt-4o", {"temperature": 0.2}),
    ],
    validator=lambda r: len(r["text"]) > 40,
)

client = Loom.from_env()
result = client.route(router, prompt="Explain quantum entanglement.")

result["_router"]["used"]    # which model won
result["_router"]["tried"]   # all attempted

setup.pypython

from loom import Catalog

c = Catalog.from_yaml("models.yaml")

# Register a new OpenAI-shape vendor in ~10 lines:
c.register_openai_compatible(
    key="newco",
    label="NewCo AI",
    base_url="https://api.newco.ai/v1",
    api_key_env="NEWCO_API_KEY",
)

c.register_model(
    provider="newco",
    model_id="newco-large",
    upstream_model="newco-large-2026-01",
    input_cost_per_1m=2.50,
    output_cost_per_1m=10.00,
)

batch

batch.pypython

from loom import Loom, BatchRequest

client = Loom.from_env()

handle = client.submit_batch([
    BatchRequest(provider="openai", modality="text",
                 model="gpt-4o-mini",
                 prompt="summarize row 1", custom_id="row-1"),
    BatchRequest(provider="openai", modality="text",
                 model="gpt-4o-mini",
                 prompt="summarize row 2", custom_id="row-2"),
])

print(handle.id, handle.status())
results = handle.wait(poll_interval=60.0, timeout=24 * 3600)

Cost optimization

Built-in savings on every call.

Response cache, vendor prompt cache, smart routing, and batch APIs — orchestrated automatically. Targets validated in production.

Response cache

20–60%

Saved on workloads with repeated queries.

Prompt cache

50–90%

Discount on cached prefix tokens at the vendor.

Smart routing

50–80%

Saved on mixed workloads via cheap-first dispatch.

Batch API

~50%

Cheaper than real-time calls with a ~24h SLA.

Workload cost · 30 days

Optimization layer at work

Saved

−58.4%

BaselineWith Loom

Build AI infrastructure once.

Unify providers, optimize costs, and ship AI products faster. One install. Every vendor. Production from day one.

Start Building View GitHub

$ pip install loom-router

One API for everyAI provider.