Documentation

Using Loom

One contract for every AI vendor. This guide takes you from install to production — calling generate(), reading the response, and turning on the optimization layer.

The mental model

Loom gives you one function for every provider:

quickstart.pypython
import loom

result = loom.generate(
    provider="openai",
    modality="text",
    model="gpt-4o-mini",
    prompt="Say hi in five words.",
)
print(result["text"])
print(result["cost"]["usd"])

Four things identify what to run — provider, modality, model, prompt — and the return shape is the same no matter which vendor answered. Everything else (keys, retries, caching, cost accounting, logging) is handled inside Loom.

  • Catalog — the list of models Loom knows about. A model string like gpt-4o-mini resolves to an upstream model ID plus default params.
  • Provider adapter — the per-vendor module that makes the call with the vendor's native SDK.
  • Loom client — ties a catalog, your API keys, and the optimization layer together. The module-level loom.generate(...) runs on a default client built from your environment.

Install

terminalbash
pip install "loom-router[openai]"      # one vendor
pip install "loom-router[all]"         # every supported vendor

Extras are optional dependency groups: openai, anthropic, gemini, tencent, yaml, redis, all. Loom's only hard dependencies are requests and python-dotenv, so installing with no extra gives you the catalog and dispatcher but no vendor SDK.

Three ways to call

All three share the same catalog, cost computation, and logging — pick based on how much control you need over configuration.

Module-level — simplest

Runs on a default client built from Loom.from_env(). Best for scripts and notebooks.

python
import loom

result = loom.generate(
    provider="anthropic",
    modality="text",
    model="claude-sonnet-4-6",
    prompt="Write three subject lines for a launch email.",
)

from_env — long-lived

Build one client at startup and reuse it. Best for web apps and workers — avoids re-reading config on every call.

python
from loom import Loom

client = Loom.from_env()        # reads keys from .env + environment

result = client.generate(
    provider="anthropic",
    modality="text",
    model="claude-sonnet-4-6",
    prompt="...",
)

Explicit construction

Pass everything in directly — keys, catalog, cache. Best when config comes from a secrets manager rather than the environment.

python
from loom import Loom

client = Loom(
    api_keys={"OPENAI_API_KEY": "sk-...", "ANTHROPIC_API_KEY": "sk-ant-..."},
    local_currency="USD",
    local_to_usd=1.0,
)

Configuring API keys

Loom looks for a vendor key in three places, in order: the api_keys dict you passed the client, the process environment, then an optional vault backend. Keys use the standard vendor names (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, …).

Environment / .env — the default

Loom.from_env() loads a .env file from the working directory if present, without overriding values already set in the environment.

.envbash
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Programmatic

Pass keys in; they win over the environment — the right hook for services pulling secrets at boot.

python
from loom import Loom

client = Loom(api_keys={
    "OPENAI_API_KEY": secrets_manager.get("openai/prod"),
})

Vault

Point Loom at a bundled backend (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) and let it fetch keys lazily, cached in-process with a 5-minute TTL.

python
from loom import Loom, AWSSecretsManagerVault

client = Loom.from_env(
    vault=AWSSecretsManagerVault(region_name="us-east-1", prefix="prod/loom/"),
)

The generate call

python
client.generate(
    provider="openai",     # vendor key
    modality="text",       # "text" | "image" | "video"
    model="gpt-4o-mini",   # catalog model id
    prompt="...",          # the user prompt
    params={...},          # optional, merged over catalog defaults
    use_cache=True,        # optional, per-call cache opt-out
)
  • params is merged on top of the catalog defaults for that model, so you only specify what you want to override (max_tokens, temperature, vendor-specific flags like system, size).
  • use_cache=False forces a fresh upstream call even when a cache is configured.
  • Some catalog model IDs bake params in. For example gpt-image-1-low sets quality=low for you — you can still override via params.

The response shape

Every successful call returns a dict with a kind discriminator plus usage and cost.

Text

python
{
    "kind": "text",
    "text": "...",
    "usage": {"input_tokens": 12, "output_tokens": 30, "cached_tokens": 0},
    "cost": {"usd": 0.000028, "local": 0.0023, "local_currency": "INR"},
}

Image

python
{
    "kind": "image",
    "images": [{"mime_type": "image/png", "data_b64": "..."}],
    "cost": {"usd": 0.01, "local": 0.83, "local_currency": "INR"},
}

Cost is reported in USD and a configurable local currency (default INR):

python
Loom(local_currency="USD", local_to_usd=1.0)    # report USD only
Loom(local_currency="EUR", local_to_usd=1.08)   # 1 EUR ~= 1.08 USD

Async

Every call has an async sibling. Use the module-level agenerate, or an AsyncLoom client for a long-lived service.

api.pypython
import asyncio
from loom import AsyncLoom

aclient = AsyncLoom.from_env()

async def main():
    result = await aclient.generate(
        provider="gemini",
        modality="text",
        model="gemini-2.5-pro",
        prompt="...",
    )
    return result["text"]

asyncio.run(main())

Error handling

Loom maps each vendor's exception hierarchy onto one shared set, so you catch the same types regardless of provider.

python
from loom import AuthError, RateLimitError, ModelNotFoundError, ProviderError

try:
    result = loom.generate(provider=..., modality=..., model=..., prompt=...)
except AuthError:
    ...            # missing / invalid key - a config problem
except RateLimitError:
    ...            # back off, retry, or fail over to another model
except ModelNotFoundError:
    ...            # catalog doesn't know that (provider, modality, model)
except ProviderError:
    ...            # anything else from the upstream call

Custom catalogs

To run your own model list (different pricing, internal IDs, an approved subset), point Loom at a YAML file. The schema matches the bundled catalog.

models.yamlyaml
openai:
  label: OpenAI
  modalities:
    text:
      - id: small
        name: Cheap default
        model: gpt-4o-mini
        input_inr_per_1m: 14.4578
        output_inr_per_1m: 57.8312
python
from loom import Loom, Catalog

client = Loom(catalog=Catalog.from_yaml("models.yaml"))
client.generate(provider="openai", modality="text", model="small", prompt="hi")

You can also register models programmatically with catalog.register_model(...).

Optimization features

These are off (cache) or on with safe defaults (retry, dedup). Turn them on once your behavior is stable. Each composes with the others.

Response cache

Identical (provider, model, prompt, params) calls hit the cache instead of the API.

python
from loom import Loom, InMemoryCache, RedisCache

client = Loom(cache=InMemoryCache(maxsize=10_000, ttl=3600))

# multi-process deployments - share the cache across workers
client = Loom(cache=RedisCache(url="redis://internal:6379/0", ttl=3600))

Vendor prompt caching

OpenAI and DeepSeek cache long prompt prefixes automatically — the saving shows up in result["usage"]["cached_tokens"] and is already discounted in result["cost"]. Anthropic is opt-in:

python
loom.generate(
    provider="anthropic", modality="text", model="claude-haiku-4-5",
    prompt=user_question,
    params={"system": LONG_STATIC_SYSTEM_PROMPT, "cache_system": True},
)

Gemini uses an uploaded context-cache resource referenced by ID — the right shape when one big context answers many small questions:

python
cache = client.create_context_cache(
    provider="gemini", model="gemini-2.5-flash",
    contents=long_static_document, ttl_seconds=600,
)
result = client.generate(
    provider="gemini", modality="text", model="gemini-2.5-flash",
    prompt=user_question, params={"cached_content": cache.id},
)
client.delete_context_cache(cache)

Cheap-first routing

Try a small model first; escalate only when a validator says the answer isn't good enough.

python
from loom import Loom, Router

router = Router(
    candidates=[
        ("openai", "text", "gpt-4o-mini"),   # try cheap first
        ("openai", "text", "gpt-4o"),        # escalate
    ],
    validator=lambda result: "I don't know" not in result["text"],
)

result = Loom.from_env().route(router, prompt=user_question)
print(result["_router"]["used"])   # which model actually answered

Cross-vendor failover

Stay up when one vendor flakes by falling through an equivalence map.

python
from loom import Loom, Router

router = Router.failover(provider="openai", modality="text", model="gpt-4o-mini")
result = Loom.from_env().route(router, prompt=question)

Loom tries OpenAI first; on failure it falls through bundled equivalents (claude-haiku-4-5, gemini-2.5-flash, deepseek-v3 by default). You need keys for every vendor in the chain.

Batch API

For workloads that tolerate ~24h latency, vendor batch endpoints are ~50% cheaper. OpenAI and Anthropic batch adapters ship today.

batch.pypython
from loom import Loom, BatchRequest

client = Loom.from_env()

handle = client.submit_batch([
    BatchRequest(provider="openai", modality="text",
                 model="gpt-4o-mini", prompt=text, custom_id=row_id)
    for row_id, text in rows
])
results = handle.wait()          # aligned to input order; pick up later via handle.id

# or block until done:
results = client.run_batch(requests, poll_interval=60.0)

Per-call opt-outs

python
client.generate(..., use_cache=False)   # skip the cache for this call
Loom(dedup=False)                       # don't coalesce concurrent identical calls
Loom(retry=None)                        # don't retry on rate limits

Observability

Turn on the bundled SQLite sink + Flask dashboard for a queryable record of every call and a one-page cost/latency/cache summary.

admin.pypython
import logging
from flask import Flask
from loom.observability import SQLiteSink, LoomLogHandler, make_dashboard

sink = SQLiteSink("loom_events.db")
logging.getLogger("loom").addHandler(LoomLogHandler(sink))

app = Flask(__name__)
app.register_blueprint(make_dashboard(sink), url_prefix="/loom-admin")

Every call now lands in loom_events.db, and /loom-admin/ shows cost by provider, top spend by model, cache-hit rate, dedup rate, and the last 25 calls. The dashboard ships without auth — wrap it in your host app's login like any internal admin page.