The mental model
Loom gives you one function for every provider:
import loom
result = loom.generate(
provider="openai",
modality="text",
model="gpt-4o-mini",
prompt="Say hi in five words.",
)
print(result["text"])
print(result["cost"]["usd"])Four things identify what to run — provider, modality, model, prompt — and the return shape is the same no matter which vendor answered. Everything else (keys, retries, caching, cost accounting, logging) is handled inside Loom.
- Catalog — the list of models Loom knows about. A
modelstring likegpt-4o-miniresolves to an upstream model ID plus default params. - Provider adapter — the per-vendor module that makes the call with the vendor's native SDK.
- Loom client — ties a catalog, your API keys, and the optimization layer together. The module-level
loom.generate(...)runs on a default client built from your environment.
Install
pip install "loom-router[openai]" # one vendor
pip install "loom-router[all]" # every supported vendorExtras are optional dependency groups: openai, anthropic, gemini, tencent, yaml, redis, all. Loom's only hard dependencies are requests and python-dotenv, so installing with no extra gives you the catalog and dispatcher but no vendor SDK.
Three ways to call
All three share the same catalog, cost computation, and logging — pick based on how much control you need over configuration.
Module-level — simplest
Runs on a default client built from Loom.from_env(). Best for scripts and notebooks.
import loom
result = loom.generate(
provider="anthropic",
modality="text",
model="claude-sonnet-4-6",
prompt="Write three subject lines for a launch email.",
)from_env — long-lived
Build one client at startup and reuse it. Best for web apps and workers — avoids re-reading config on every call.
from loom import Loom
client = Loom.from_env() # reads keys from .env + environment
result = client.generate(
provider="anthropic",
modality="text",
model="claude-sonnet-4-6",
prompt="...",
)Explicit construction
Pass everything in directly — keys, catalog, cache. Best when config comes from a secrets manager rather than the environment.
from loom import Loom
client = Loom(
api_keys={"OPENAI_API_KEY": "sk-...", "ANTHROPIC_API_KEY": "sk-ant-..."},
local_currency="USD",
local_to_usd=1.0,
)Configuring API keys
Loom looks for a vendor key in three places, in order: the api_keys dict you passed the client, the process environment, then an optional vault backend. Keys use the standard vendor names (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, …).
Environment / .env — the default
Loom.from_env() loads a .env file from the working directory if present, without overriding values already set in the environment.
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...Programmatic
Pass keys in; they win over the environment — the right hook for services pulling secrets at boot.
from loom import Loom
client = Loom(api_keys={
"OPENAI_API_KEY": secrets_manager.get("openai/prod"),
})Vault
Point Loom at a bundled backend (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) and let it fetch keys lazily, cached in-process with a 5-minute TTL.
from loom import Loom, AWSSecretsManagerVault
client = Loom.from_env(
vault=AWSSecretsManagerVault(region_name="us-east-1", prefix="prod/loom/"),
)The generate call
client.generate(
provider="openai", # vendor key
modality="text", # "text" | "image" | "video"
model="gpt-4o-mini", # catalog model id
prompt="...", # the user prompt
params={...}, # optional, merged over catalog defaults
use_cache=True, # optional, per-call cache opt-out
)paramsis merged on top of the catalog defaults for that model, so you only specify what you want to override (max_tokens,temperature, vendor-specific flags likesystem,size).use_cache=Falseforces a fresh upstream call even when a cache is configured.- Some catalog model IDs bake params in. For example
gpt-image-1-lowsetsquality=lowfor you — you can still override viaparams.
The response shape
Every successful call returns a dict with a kind discriminator plus usage and cost.
Text
{
"kind": "text",
"text": "...",
"usage": {"input_tokens": 12, "output_tokens": 30, "cached_tokens": 0},
"cost": {"usd": 0.000028, "local": 0.0023, "local_currency": "INR"},
}Image
{
"kind": "image",
"images": [{"mime_type": "image/png", "data_b64": "..."}],
"cost": {"usd": 0.01, "local": 0.83, "local_currency": "INR"},
}Cost is reported in USD and a configurable local currency (default INR):
Loom(local_currency="USD", local_to_usd=1.0) # report USD only
Loom(local_currency="EUR", local_to_usd=1.08) # 1 EUR ~= 1.08 USDAsync
Every call has an async sibling. Use the module-level agenerate, or an AsyncLoom client for a long-lived service.
import asyncio
from loom import AsyncLoom
aclient = AsyncLoom.from_env()
async def main():
result = await aclient.generate(
provider="gemini",
modality="text",
model="gemini-2.5-pro",
prompt="...",
)
return result["text"]
asyncio.run(main())Error handling
Loom maps each vendor's exception hierarchy onto one shared set, so you catch the same types regardless of provider.
from loom import AuthError, RateLimitError, ModelNotFoundError, ProviderError
try:
result = loom.generate(provider=..., modality=..., model=..., prompt=...)
except AuthError:
... # missing / invalid key - a config problem
except RateLimitError:
... # back off, retry, or fail over to another model
except ModelNotFoundError:
... # catalog doesn't know that (provider, modality, model)
except ProviderError:
... # anything else from the upstream callCustom catalogs
To run your own model list (different pricing, internal IDs, an approved subset), point Loom at a YAML file. The schema matches the bundled catalog.
openai:
label: OpenAI
modalities:
text:
- id: small
name: Cheap default
model: gpt-4o-mini
input_inr_per_1m: 14.4578
output_inr_per_1m: 57.8312from loom import Loom, Catalog
client = Loom(catalog=Catalog.from_yaml("models.yaml"))
client.generate(provider="openai", modality="text", model="small", prompt="hi")You can also register models programmatically with catalog.register_model(...).
Optimization features
These are off (cache) or on with safe defaults (retry, dedup). Turn them on once your behavior is stable. Each composes with the others.
Response cache
Identical (provider, model, prompt, params) calls hit the cache instead of the API.
from loom import Loom, InMemoryCache, RedisCache
client = Loom(cache=InMemoryCache(maxsize=10_000, ttl=3600))
# multi-process deployments - share the cache across workers
client = Loom(cache=RedisCache(url="redis://internal:6379/0", ttl=3600))Vendor prompt caching
OpenAI and DeepSeek cache long prompt prefixes automatically — the saving shows up in result["usage"]["cached_tokens"] and is already discounted in result["cost"]. Anthropic is opt-in:
loom.generate(
provider="anthropic", modality="text", model="claude-haiku-4-5",
prompt=user_question,
params={"system": LONG_STATIC_SYSTEM_PROMPT, "cache_system": True},
)Gemini uses an uploaded context-cache resource referenced by ID — the right shape when one big context answers many small questions:
cache = client.create_context_cache(
provider="gemini", model="gemini-2.5-flash",
contents=long_static_document, ttl_seconds=600,
)
result = client.generate(
provider="gemini", modality="text", model="gemini-2.5-flash",
prompt=user_question, params={"cached_content": cache.id},
)
client.delete_context_cache(cache)Cheap-first routing
Try a small model first; escalate only when a validator says the answer isn't good enough.
from loom import Loom, Router
router = Router(
candidates=[
("openai", "text", "gpt-4o-mini"), # try cheap first
("openai", "text", "gpt-4o"), # escalate
],
validator=lambda result: "I don't know" not in result["text"],
)
result = Loom.from_env().route(router, prompt=user_question)
print(result["_router"]["used"]) # which model actually answeredCross-vendor failover
Stay up when one vendor flakes by falling through an equivalence map.
from loom import Loom, Router
router = Router.failover(provider="openai", modality="text", model="gpt-4o-mini")
result = Loom.from_env().route(router, prompt=question)Loom tries OpenAI first; on failure it falls through bundled equivalents (claude-haiku-4-5, gemini-2.5-flash, deepseek-v3 by default). You need keys for every vendor in the chain.
Batch API
For workloads that tolerate ~24h latency, vendor batch endpoints are ~50% cheaper. OpenAI and Anthropic batch adapters ship today.
from loom import Loom, BatchRequest
client = Loom.from_env()
handle = client.submit_batch([
BatchRequest(provider="openai", modality="text",
model="gpt-4o-mini", prompt=text, custom_id=row_id)
for row_id, text in rows
])
results = handle.wait() # aligned to input order; pick up later via handle.id
# or block until done:
results = client.run_batch(requests, poll_interval=60.0)Per-call opt-outs
client.generate(..., use_cache=False) # skip the cache for this call
Loom(dedup=False) # don't coalesce concurrent identical calls
Loom(retry=None) # don't retry on rate limitsObservability
Turn on the bundled SQLite sink + Flask dashboard for a queryable record of every call and a one-page cost/latency/cache summary.
import logging
from flask import Flask
from loom.observability import SQLiteSink, LoomLogHandler, make_dashboard
sink = SQLiteSink("loom_events.db")
logging.getLogger("loom").addHandler(LoomLogHandler(sink))
app = Flask(__name__)
app.register_blueprint(make_dashboard(sink), url_prefix="/loom-admin")Every call now lands in loom_events.db, and /loom-admin/ shows cost by provider, top spend by model, cache-hit rate, dedup rate, and the last 25 calls. The dashboard ships without auth — wrap it in your host app's login like any internal admin page.