
Amit Hariyale
Full Stack Web Developer, Gigawave

Full Stack Web Developer, Gigawave
how to run use ai agents in cloudflare matters in real projects because weak implementation choices create hard-to-debug failures and inconsistent user experience.
This guide uses focused, production-oriented steps and code examples grounded in official references.
Cloudflare Workers AI Serverless GPU inference service running on Cloudflare s edge networkDurable Objects Cloudflare s strongly consistent stateful compute primitive for WorkersAI binding Runtime-injected client for Workers AI providing authenticated access without API keysTool calling Pattern where LLMs output structured function calls for external executionContext window Maximum token limit for model inputUnclear setup path for how to run use ai agents in cloudflareInconsistent implementation patternsMissing validation for edge casesKeep implementation modular and testableUse one clear source of truth for configurationValidate behavior before optimizationStep 1 Define prerequisites and expected behavior for how to run use ai agents in cloudflare.We start with minimal setup, then move to implementation patterns and validation checkpoints for how to run use ai agents in cloudflare.
Apply a step-by-step architecture: setup, core implementation, validation, and performance checks for how to run use ai agents in cloudflare.
1import { Ai } from '@cloudflare/ai';
2
3 export interface Env {
4 AI: Ai;
5 }
6
7 export default {
8 async fetch(request: Request, env: Env): Promise<Response> {
9 const ai = new Ai(env.AI);
10
11 const response = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
12 messages: [
13 { role: 'system', content: 'You are helpful.' },
14 { role: 'user', content: 'Hello!' }
15 ],
16 max_tokens: 100,
17 });
18
19 return Response.json(response);
20 },
21 };1import { Ai } from '@cloudflare/ai';
2
3 export interface Env {
4 AI: Ai;
5 }
6
7 export class AgentSession {
8 private state: DurableObjectState;
9 private messages: Array<{role: string, content: string}> = [];
10
11 constructor(state: DurableObjectState) {
12 this.state = state;
13 }
14
15 async fetch(request: Request, env: Env): Promise<Response> {
16 const ai = new Ai(env.AI);
17 const { message } = await request.json();
18
19 const stored = await this.state.storage.get<Array<any>>('messages');
20 this.messages = stored || [];
21 this.messages.push({ role: 'user', content: message });
22
23 const response = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
24 messages: this.messages.slice(-12),
25 max_tokens: 500,
26 });
27
28 this.messages.push({ role: 'assistant', content: response.response });
29 await this.state.storage.put('messages', this.messages);
30
31 return Response.json({ reply: response.response, messages: this.messages.length });
32 }
33 }1import { Ai } from '@cloudflare/ai';
2 import { AgentSession } from './AgentSession';
3
4 export interface Env {
5 AI: Ai;
6 AGENT_SESSION: DurableObjectNamespace<AgentSession>;
7 }
8
9 export { AgentSession };
10
11 export default {
12 async fetch(request: Request, env: Env): Promise<Response> {
13 const url = new URL(request.url);
14 const sessionId = url.pathname.slice(1) || 'default';
15
16 const id = env.AGENT_SESSION.idFromName(sessionId);
17 const session = env.AGENT_SESSION.get(id);
18
19 return session.fetch(request);
20 },
21 };Treat how to run use ai agents in cloudflare as an iterative build: baseline first, then reliability and performance hardening.
Only real code appears in code blocks. Other content is rendered as normal headings, lists, and text.
You built a prototype AI agent that works perfectly on your laptop. Then you try to deploy it. Suddenly you're managing GPU instances, wrestling with cold starts, and watching your infrastructure bill spiral. The edge was supposed to solve this, but most platforms still force you to choose between latency and complexity.
Cloudflare Workers AI changes the calculation. You can run inference directly on their edge network—no containers to manage, no clusters to provision, and response times measured in milliseconds for users worldwide. This isn't a future promise; it's production-ready now with models from Meta, Mistral, and others available via API.
Cloudflare Workers AI provides serverless GPU inference at the edge. Instead of shipping model weights to your infrastructure, you call models hosted on Cloudflare's network. Your agent logic runs in a Worker (V8 isolate), which can chain multiple inference calls, maintain state via Durable Objects or KV, and respond to requests globally with minimal latency.
Prerequisites:
The deployment trap: Local AI development often uses Ollama, LM Studio, or direct API calls to OpenAI. These patterns break when you need global scale. Self-hosting requires GPU machines, model serving infrastructure, and operational expertise most teams lack.
Edge-specific failure points:
Symptoms you'll recognize: Agents that time out, inconsistent response times across regions, ballooning infrastructure costs, or architectural complexity that slows iteration.
We'll build a stateful AI agent using Cloudflare Workers AI for inference, Durable Objects for conversation memory, and a simple tool-calling pattern. This approach keeps compute at the edge, maintains sub-100ms response times for cached regions, and scales to zero when idle.
Why this over alternatives:
Create a new Worker project and enable AI binding.
1mkdir cf-ai-agent && cd cf-ai-agent
2wrangler init --yesAdd the AI binding to wrangler.toml:
1name = "ai-agent"
2main = "src/index.ts"
3compatibility_date = "2024-06-14"
4
5[ai]
6binding = "AI"Install dependencies:
1npm install @cloudflare/aiImplement a Worker that calls Workers AI with the Llama 3.1 model.
1// src/index.ts
2import { Ai } from '@cloudflare/ai';
3
4export interface Env {
5 AI: Ai;
6}
7
8export default {
9 async fetch(request: Request, env: Env): Promise<Response> {
10 const ai = new Ai(env.AI);
11
12 const messages = [
13 { role: 'system', content: 'You are a helpful assistant.' },
14 { role: 'user', content: 'Explain edge computing in one sentence.' }
15 ];
16
17 const response = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
18 messages,
19 max_tokens: 100,
20 });
21
22 return Response.json(response);
23 },
24};Deploy and test:
1wrangler deploy
2curl https://ai-agent.YOUR_SUBDOMAIN.workers.devCreate a Durable Object to maintain conversation history across requests.
1// src/AgentSession.ts
2export class AgentSession {
3 private state: DurableObjectState;
4 private messages: Array<{role: string, content: string}> = [];
5
6 constructor(state: DurableObjectState) {
7 this.state = state;
8 this.messages = state.storage.get('messages') || [];
9 }
10
11 async fetch(request: Request, env: Env): Promise<Response> {
12 const ai = new Ai(env.AI);
13 const { message } = await request.json();
14
15 this.messages.push({ role: 'user', content: message });
16
17 // Keep context window manageable
18 const contextWindow = this.messages.slice(-10);
19
20 const response = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
21 messages: [
22 { role: 'system', content: 'You are a helpful assistant with tool access.' },
23 ...contextWindow
24 ],
25 max_tokens: 500,
26 });
27
28 this.messages.push({ role: 'assistant', content: response.response });
29 await this.state.storage.put('messages', this.messages);
30
31 return Response.json({ reply: response.response });
32 }
33}Update wrangler.toml:
1[[durable_objects.bindings]]
2name = "AGENT_SESSION"
3class_name = "AgentSession"
4
5[[migrations]]
6tag = "v1"
7new_classes = ["AgentSession"]Add structured tool use so your agent can interact with external systems.
1// src/tools.ts
2export const tools = [
3 {
4 name: 'get_weather',
5 description: 'Get current weather for a location',
6 parameters: {
7 type: 'object',
8 properties: {
9 location: { type: 'string', description: 'City name' }
10 },
11 required: ['location']
12 }
13 }
14];
15
16export async function executeTool(name: string, args: any): Promise<string> {
17 if (name === 'get_weather') {
18 // Call external weather API or cache
19 return `Weather in ${args.location}: 72°F, sunny`;
20 }
21 throw new Error(`Unknown tool: ${name}`);
22}Integrate into the agent loop:
1// In AgentSession.fetch()
2const toolPrompt = `You have access to tools: ${JSON.stringify(tools)}.
3Respond with JSON: {"tool": "name", "args": {...}} or {"response": "..."}`;
4
5const aiResponse = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
6 messages: [
7 { role: 'system', content: toolPrompt },
8 ...contextWindow
9 ],
10 max_tokens: 500,
11});
12
13// Parse and handle tool calls
14let parsed;
15try {
16 parsed = JSON.parse(aiResponse.response);
17} catch {
18 return Response.json({ reply: aiResponse.response });
19}
20
21if (parsed.tool) {
22 const toolResult = await executeTool(parsed.tool, parsed.args);
23 this.messages.push({ role: 'assistant', content: `Tool result: ${toolResult}` });
24 // Re-run for final response
25}Snippet 1: Basic Worker with AI Binding
1import { Ai } from '@cloudflare/ai';
2
3 export interface Env {
4 AI: Ai;
5 }
6
7 export default {
8 async fetch(request: Request, env: Env): Promise<Response> {
9 const ai = new Ai(env.AI);
10
11 const response = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
12 messages: [
13 { role: 'system', content: 'You are helpful.' },
14 { role: 'user', content: 'Hello!' }
15 ],
16 max_tokens: 100,
17 });
18
19 return Response.json(response);
20 },
21 };Snippet 2: Durable Object for Stateful Sessions
1import { Ai } from '@cloudflare/ai';
2
3 export interface Env {
4 AI: Ai;
5 }
6
7 export class AgentSession {
8 private state: DurableObjectState;
9 private messages: Array<{role: string, content: string}> = [];
10
11 constructor(state: DurableObjectState) {
12 this.state = state;
13 }
14
15 async fetch(request: Request, env: Env): Promise<Response> {
16 const ai = new Ai(env.AI);
17 const { message } = await request.json();
18
19 const stored = await this.state.storage.get<Array<any>>('messages');
20 this.messages = stored || [];
21 this.messages.push({ role: 'user', content: message });
22
23 const response = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
24 messages: this.messages.slice(-12),
25 max_tokens: 500,
26 });
27
28 this.messages.push({ role: 'assistant', content: response.response });
29 await this.state.storage.put('messages', this.messages);
30
31 return Response.json({ reply: response.response, messages: this.messages.length });
32 }
33 }Snippet 3: Wrangler Configuration
1name = "ai-agent"
2 main = "src/index.ts"
3 compatibility_date = "2024-06-14"
4
5 [ai]
6 binding = "AI"
7
8 [[durable_objects.bindings]]
9 name = "AGENT_SESSION"
10 class_name = "AgentSession"
11
12 [[migrations]]
13 tag = "v1"
14 new_classes = ["AgentSession"]Snippet 4: Router to Durable Objects
1import { Ai } from '@cloudflare/ai';
2 import { AgentSession } from './AgentSession';
3
4 export interface Env {
5 AI: Ai;
6 AGENT_SESSION: DurableObjectNamespace<AgentSession>;
7 }
8
9 export { AgentSession };
10
11 export default {
12 async fetch(request: Request, env: Env): Promise<Response> {
13 const url = new URL(request.url);
14 const sessionId = url.pathname.slice(1) || 'default';
15
16 const id = env.AGENT_SESSION.idFromName(sessionId);
17 const session = env.AGENT_SESSION.get(id);
18
19 return session.fetch(request);
20 },
21 };Key mechanics:
What can go wrong:
Expected behavior: Sub-200ms response times for cached regions, automatic failover to available inference nodes, zero-downtime deployments on wrangler deploy.
Long-running agent loops: If your agent chains multiple tool calls, you may hit Worker execution limits (50ms CPU time on free tier, 30s wall clock on paid). Use waitUntil for background processing or break chains into multiple requests.
Concurrent session access: Durable Objects process one request at a time per instance. High-frequency updates to the same session serialize automatically—design for this or shard sessions.
Model unavailability: Cloudflare may rate-limit or temporarily remove models. Implement fallback logic:
1const models = ['@cf/meta/llama-3.1-8b-instruct', '@cf/mistral/mistral-7b-instruct-v0.2'];
2for (const model of models) {
3 try {
4 return await ai.run(model, params);
5 } catch (e) {
6 if (e.message.includes('model')) continue;
7 throw e;
8 }
9}Token overflow: Models have fixed context windows. Exceeding them throws errors. Always truncate or summarize history before sending.
Official Sources:
High-Signal Community References:
Running AI agents at the edge removes the infrastructure barrier that slows most teams down. You get global distribution, automatic scaling, and sub-100ms response times without managing a single GPU. The tradeoff is less control over model versions and some constraints on execution time—constraints that force better architecture.
Start with a simple stateless agent, add Durable Objects when you need memory, then layer in tool calling. Measure token costs early; they're predictable but not zero. The platform is mature enough for production—what matters now is your agent's logic, not your infrastructure.
Next step: Deploy the code from Step 2, then modify it to call a real API via tool calling. Ship it.