Research notes from ax-llm/ax

Ax from 1 to 100

Ax is a TypeScript framework that brings the DSPy programming model to AI apps: declare typed input-output signatures, let Ax generate prompts, call any supported model, parse and validate results, then compose the same programs into tools, agents, workflows, and optimizers.

1Core idea: signatures over prompts
15+Providers behind one API
4Main layers: Gen, Agent, Flow, Optimize
0Core package dependencies

1. Mental Model

Ax is easiest to understand as a compiler and runtime for typed AI programs. You describe what data goes in and what data should come out. Ax turns that contract into a provider-specific prompt, streams and parses the model response, validates it, retries when possible, and returns a typed TypeScript object.

Signature
Prompt
Provider
Parser + Validation
Typed Output

What Ax replaces

Manual prompt strings, hand-written JSON parsing, brittle output repair, provider-specific SDK code, and ad hoc prompt versioning.

What Ax does not replace

Good task design, clean input data, eval datasets, sensible tools, safe permissions, and domain-specific success metrics.

2. Setup

Install the core package and create an AI service with the provider you want. The same Ax program can usually switch providers by changing the service config.

npm install @ax-llm/ax
import { ai, ax } from "@ax-llm/ax";

const llm = ai({
  name: "openai",
  apiKey: process.env.OPENAI_APIKEY!,
});

const classify = ax('review:string -> sentiment:class "positive, negative, neutral"');

const result = await classify.forward(llm, {
  review: "This product is amazing!",
});

console.log(result.sentiment); // typed as "positive" | "negative" | "neutral"

3. Signatures

Signatures are Ax's central abstraction. They define input fields, output fields, types, optionality, validation constraints, and sometimes prompt hints. If you understand signatures, the rest of Ax becomes composition.

Syntax

[description] input1:type, input2:type -> output1:type, output2:type

String DSL

Best for simple programs. Example: question:string -> answer:string.

f() Builder

Best for nested objects, descriptions, modifiers, and constraints.

Standard Schema

Use Zod, Valibot, or ArkType for schema-first type inference and validation.

Common Types

stringnumberbooleanjsondatedatetimeurlcodeclassimageaudiofile

import { ax, f } from "@ax-llm/ax";

const extractor = ax(`
  customerEmail:string, currentDate:datetime ->
  priority:class "high, normal, low",
  sentiment:class "positive, negative, neutral",
  ticketNumber?:number,
  nextSteps:string[]
`);

const productExtractor = ax(
  f()
    .input("productPage", f.string("Raw product page text"))
    .output("product", f.object({
      name: f.string(),
      price: f.number(),
      specs: f.object({
        materials: f.string().array(),
      }),
    }))
    .build(),
);

Rule of thumb: use string signatures until they become hard to read. Move to f() or Zod when you need nested objects, constraints, or shared schemas.

4. AxGen

ax(...) creates an AxGen: a single-step AI program. It is the basic unit for extraction, classification, rewriting, scoring, summarization, multimodal analysis, and tool-using ReAct-style calls.

const qa = ax('userQuestion:string -> answer:string, confidence:number');

const { answer, confidence } = await qa.forward(llm, {
  userQuestion: "What is TypeScript?",
});

Generation Lifecycle

  1. Render a prompt from the signature and inputs.
  2. Call the selected AI provider.
  3. Stream and parse output fields.
  4. Validate field values and schema constraints.
  5. Retry with correction feedback when validation fails.
  6. Return the typed final object.

5. Providers

Ax exposes provider services through ai({ name }). The README lists OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Groq, Together, Ollama, OpenRouter, Bedrock via optional package, Reka, DeepSeek, Grok, HuggingFace, and WebLLM.

const openai = ai({ name: "openai", apiKey: process.env.OPENAI_APIKEY! });
const anthropic = ai({ name: "anthropic", apiKey: process.env.ANTHROPIC_APIKEY! });
const gemini = ai({ name: "google-gemini", apiKey: process.env.GOOGLE_APIKEY! });

// Same program, different runtime service.
await classify.forward(openai, { review });
await classify.forward(anthropic, { review });
await classify.forward(gemini, { review });

Provider portability is not model equivalence. The code can switch easily, but quality, tool support, audio support, context caching, reasoning modes, and latency differ by provider and model.

6. Streaming, Validation, and Retries

Ax is streaming-first. Even when you call forward() and only need a final object, Ax can parse fields as they arrive. With streamingForward(), callers can consume output incrementally.

const writer = ax('topic:string -> title:string, outline:string[], article:string');

for await (const chunk of writer.streamingForward(llm, { topic: "AI evals" })) {
  console.log(chunk);
}

Validation constraints such as min, max, email, url, regex, and enum/class choices can feed the retry pipeline. This is one of Ax's biggest practical advantages over raw provider SDK calls.

7. Tools and Function Calling

Ax programs can call host functions. For modern code, define tools with fn() for typed arguments, typed returns, examples, namespaces, and schema validation.

import { ax, f, fn } from "@ax-llm/ax";

const searchDocs = fn("searchDocs")
  .description("Search indexed documentation")
  .namespace("kb")
  .arg("query", f.string("Search query"))
  .arg("limit", f.number("Max results").optional())
  .returns(f.string("Matching snippets").array())
  .handler(async ({ query, limit }) => {
    return mySearchIndex.search(query, { limit: limit ?? 5 });
  })
  .build();

const assistant = ax("question:string -> answer:string", {
  functions: [searchDocs],
});

Use tool calls for things the model should not invent: current data, private state, database lookups, calculations, side effects, and calls to specialist systems.

8. AxAgent

agent(...) turns a signature into a long-running, tool-using actor. The README describes a three-stage pipeline: distiller, executor, responder. The executor can run an RLM loop with a sandboxed JavaScript runtime, tools, child agents, memories, and skills.

Inputs
Distiller
Executor
Tools / Runtime
Responder
import { agent, AxJSRuntime } from "@ax-llm/ax";

const analyzer = agent(
  "context:string, query:string -> answer:string, evidence:string[]",
  {
    agentIdentity: {
      name: "Document Analyzer",
      description: "Analyzes long documents with iterative sub-queries",
    },
    contextFields: ["context"],
    runtime: new AxJSRuntime(),
    maxTurns: 20,
    contextPolicy: { preset: "checkpointed", budget: "balanced" },
  },
);

const result = await analyzer.forward(llm, {
  context: veryLongDocument,
  query: "What are the main arguments and evidence?",
});

When to use an agent

  • The task needs multiple tool calls or branching decisions.
  • The model must inspect large context gradually.
  • You want child agents with distinct responsibilities.
  • You need clarification, runtime state, memories, or dynamic skills.

When not to use an agent

  • A single extraction, classification, or rewrite is enough.
  • The task has a deterministic pipeline better represented as AxFlow.
  • You do not have a clear tool boundary or stopping condition.

9. AxFlow

flow() is a typed workflow builder. Define nodes, map state into node inputs, let independent nodes auto-parallelize, then return a final typed output.

import { flow } from "@ax-llm/ax";

const emailFlow = flow<{ emailText: string }, { priority: string; rationale: string }>()
  .description("Email Priority", "Classify priority and write a short rationale")
  .node("classifier", 'emailText:string -> priority:class "high, normal, low"')
  .node("rationale", "emailText:string, priority:string -> rationale:string")
  .execute("classifier", (state) => ({ emailText: state.emailText }))
  .execute("rationale", (state) => ({
    emailText: state.emailText,
    priority: state.classifierResult.priority,
  }))
  .returns((state) => ({
    priority: state.classifierResult.priority,
    rationale: state.rationaleResult.rationale,
  }));

Use flows when the structure of the process is known: classify then route, parallel extractors then merge, iterative improve-and-check loops, map-reduce over arrays, or multi-model pipelines.

10. Optimization

Ax follows DSPy's thesis: AI programs should be optimized from examples and metrics, not endlessly hand-prompted. Ax includes optimizers such as GEPA, ACE, and bootstrap few-shot.

OptimizerUse it forMental model
AxGEPAMulti-objective prompt optimization with Pareto fronts.Search for prompts that trade off metrics like accuracy, brevity, safety, and cost.
AxACEPlaybook/curriculum-style iterative refinement.Extract better task guidance from failures and examples.
AxBootstrapFewShotCollecting useful demonstrations.Turn successful traces into few-shot examples.
const optimizer = new AxGEPA({
  studentAI,
  teacherAI,
  numTrials: 16,
  minibatch: true,
  minibatchSize: 6,
  seed: 42,
});

const result = await optimizer.compile(
  emailFlow,
  trainSet,
  async ({ prediction, example }) => ({
    accuracy: prediction.priority === example.priority ? 1 : 0,
    brevity: prediction.rationale.length <= 60 ? 1 : 0.4,
  }),
  { auto: "medium", validationExamples: valSet, maxMetricCalls: 240 },
);

Optimization quality is metric quality. A vague or gameable metric will optimize the wrong behavior faster.

11. Advanced Topics

RLM and sandboxed JS

AxJSRuntime lets agents execute code in a hardened sandbox. Permissions are opt-in for network, filesystem, storage, child process, and similar capabilities.

Context maps

AxAgentContextMap stores persistent orientation for repeated questions over the same long context. Save snapshots externally with onUpdate.

Memories

Agents can call recall(...) to load vector, BM25, or KV search results. Persist memories outside Ax if they should survive across calls.

Skills

Agents can call consult(...) to load runbooks or guidance. This is useful for dynamic, task-specific instructions.

Audio and multimodal

Signatures support image, file, and audio inputs. Audio outputs are scripted artifacts: the model writes text, then Ax synthesizes speech.

Observability

Ax supports OpenTelemetry, chat logs, usage tracking, function-call callbacks, and per-turn telemetry for production debugging.

12. The 1-100 Roadmap

1-10: Understand the contract.
Build three string signatures: classification, extraction, and rewriting. Focus on clear field names and output types.
11-20: Learn provider services.
Run the same AxGen against at least two providers or models. Notice latency, quality, and tool-support differences.
21-30: Add validation.
Use classes, optional fields, arrays, dates, URLs, and constraints. Intentionally make the model fail and observe retries.
31-40: Move to f() and Zod.
Define nested objects and schemas that are too complex for comfortable string signatures.
41-50: Stream outputs.
Use streamingForward() for long generation or UI feedback. Learn where field boundaries appear.
51-60: Add tools.
Wrap one real function with fn(), namespace it, validate args, and return typed results.
61-70: Build a simple agent.
Use agent() when a task needs multiple tool calls or clarification. Keep contextFields explicit.
71-80: Orchestrate with AxFlow.
Create a typed pipeline with at least three nodes, one parallel branch, and a final returns() mapper.
81-90: Introduce evals.
Create a small train/validation set and a deterministic metric. Run optimization only after you can measure quality.
91-100: Productionize.
Add observability, cost tracking, model routing, memory/skills where useful, safety boundaries for runtime permissions, and regression evals for every critical behavior.

13. Pitfalls and Sharp Edges

Generic field names

Avoid text, data, input, and output. Use semantic names like customerEmail or riskAssessment.

Overusing agents

If the process is fixed, use AxGen or AxFlow. Agents are for dynamic reasoning, tool choice, long context, and clarification.

Bad metrics

Optimizers amplify the metric you give them. Use validation sets and watch for reward hacking.

Tool ambiguity

Give tools clear descriptions, namespaces, argument schemas, and examples. Do not rely on the model guessing function names.

Unsafe runtime permissions

AxJSRuntime is hardened by default. Add permissions only when the task actually needs them.

Provider assumptions

Provider portability does not guarantee identical outputs, token costs, audio behavior, caching, or tool semantics.

14. Resources and Notes

Sources used for these notes: the GitHub README, the package README, Ax signature reference, AxAgent reference, AxFlow reference, and DeepWiki's architecture overview for ax-llm/ax.

Version note: Ax is moving quickly. Treat this as a conceptual guide and verify exact APIs against the current package docs before copying advanced code into production.