Serverless and Traditional Hosting

Context

Agents and agent-specific asset processing logic lends itself well to Temporal Workflows. - Lots of individual steps that can fail (e.g. calls out to the model server or rate limiting errors from OpenAI). - Enforces patterns on breaking down these complex flows into a series of simple activities. - Provides good observability into both static (but more importantly) dynamic workflows - which will come along once we start looking at AI Agents. - Many of the appealing features of open source frameworks like LangGraph and LlamaIndex when building AI Agents comes with Temporal (e.g. orchestration and state management) - so all engineers can use the same frameworks.

Infrastructure considerations: - Cold starts for agents (and any asset processing jobs related to that agent) are rough - RunPod Serverless could tell us to come back later if it cannot provision any workers that match our cofiguration

Proposal

Temporal gets used for building and running the agents themselves (rather than just the infrastructure).

Infrastructure: 1. Model Server (RunPod Serverless for Dev -> Pod for Production). - All ML inference (and GPU-bound workflows) can be performed on this model server, and therefore it is the only Pod which will require GPUs. 2. Temporal Workers (RunPod Pod) - Can start with 1 (cheap) Pod which provisions a worker that has all workflows and activities registered within it. - 1 Worker + Many Workflows - gives us serverless-like effiencies. - Easier to enable shared tooling (e.g. utility functions, Pydantic models) between workflows/agents - otherwise we would need an SDK. - When we have live customers, to ensure availability of any services/workflows in the critical path we could move any of these 'critical workflows' to their own worker(s).

Dependencies

It would require us to use traditional hosting rather than serverless to host our agents.
Refactoring of agents to become Temporal workflows.
Signoff on the increased costs.

Decision

Do we implement Agents and Asset Processing as Temporal Workflows and start using traditional hosting?

Alternatives Considered

Running Temporal Workers serverlessly and wake up the worker on a set of particular events
Deploy agents to RunPod as-is
- Feels very monolithic
- Difficult to enforce common patterns and share tooling

Diagrams

Alt text