← Refinery/Stratum S6Weeks 8–12 · production cut-over by day 84Pressure 97

ServeDecision APIs · Online features · Cache

Serve is the high-pressure pump. Sub-second decisions, governed by policy, cached at the edge, traced end-to-end. Your agents stop guessing and start acting.

p99 latency
39ms
Availability
99.95%
Cost / 1M calls
− 41%
Deliverables

Everything that ships

  • Decision API gateway
    Typed endpoints, JWT + policy, rate-limit + circuit breaker.
  • Online feature serving
    Sub-10ms reads from Redis / DynamoDB, point-in-time correct.
  • Model serving lane
    Triton / vLLM / KServe with shadow + canary deploys.
  • Edge cache
    Cloudflare / CloudFront tiers for high-fan-out reads.
  • Observability
    OpenTelemetry traces from agent → API → feature → model.
Pod composition
  • Platform Engineer
  • ML Engineer
  • Reliability Engineer
Example output · Decision · /v1/next-best-actionjson
POST /v1/next-best-action
{
  "customer_id": "c_8821",
  "context": { "channel": "app", "intent": "renew" }
}
→ 200 { "action": "offer_uplift_12mo", "score": 0.81, "lat_ms": 37 }
Timeline

Weeks 8–12 · production cut-over by day 84

  1. 1
    Weeks 8–9
    Gateway + features

    Typed APIs, JWT + policy, online features under 10ms.

  2. 2
    Weeks 9–11
    Model lane

    Triton/vLLM/KServe with shadow + canary; OTel traces end-to-end.

  3. 3
    Weeks 11–12
    Edge + cut-over

    Cloudflare cache tier; production cut-over with rollback.

FAQs

Things prospects ask

Can we bring our own models?

Yes — anything that speaks gRPC/HTTP. We wrap with policy, observability and shadow deploys.

How do you handle bursty traffic?

Per-tenant rate limits, circuit breakers, edge cache for high-fan-out reads, autoscaling on request budget.

Commission · S6 Serve

Stand up Serve in Weeks 8–12.

We'll respond within one business day with a scoping note, a fixed-price outcome contract, and a named principal. Your details sync straight into our concierge queue.

  • • Outcome-priced — no T&M.
  • • Sovereign by default — your data, your region, your keys.
  • • Wired into the Fuel Pressure gauge from day one.
By submitting you agree to our outreach for this enquiry. Your details are stored in our governed lead system.