Errors and retries

Errors happen — connectors time out, HTTP calls return 500s, JSON parsing throws. This page covers what the platform records, what it retries on its own, and what’s available to you inside run().

What gets recorded when a step throws

For every step.do callback that throws:

The step is marked failed in the run timeline, with its error message and stack.
A line is appended to the run’s event log at level error.
If the error propagates out of run() (i.e. you don’t catch it), the run itself is marked failed and the error message becomes part of the run’s output.

You don’t need to log the error explicitly — the platform captures it from the throw.

Automatic retries

The platform retries transient step failures automatically before giving up. Retries happen at the step level: a step that already succeeded is never re-run on retry, only the failed (and not-yet-attempted) steps are.

A few consequences worth knowing:

Side effects must tolerate repetition. A step that posts to an external API might post twice across a retry. If the receiving system can’t deduplicate, include an idempotency key (e.g. the runId) in the request so duplicate calls collapse on the server side.
try/catch opts out of platform retry. Catching an error tells the platform “I’ve handled this” — the step is recorded as failed but the run continues. If you’d rather let the platform retry, don’t catch.

Tuning retry counts and backoff per step isn’t exposed today; you get the platform defaults.

Handle a failure and keep going

To recover from a step failure inside the loop, wrap the step in try/catch:


let scan;
try {
  await step.do("Generate a health scan", async ({ ctx }) => {
    await ctx.FACIAL_SCAN.create({ expiresIn: 3600 });
  });
  scan = await step.waitForScan({ timeout: "30 minutes" });
} catch (error) {
  console.error("scan failed, falling back to manual review", error);
  scan = { status: "fallback" };
}
 
// Loop continues with `scan` set

What this produces:

The step is still recorded as failed in the timeline (the failure happened — the timeline doesn’t lie).
The run itself does not fail; you absorbed the error.
The fallback path inside catch runs normally and counts toward the rest of the timeline.

Fail a run on purpose

To deliberately fail a run — for example, on an unrecoverable input validation error — throw from run():


async run(event: HealthEvent, step: HealthStep) {
  if (!event.body?.patientId) {
    throw new Error("patientId is required");
  }
  // …
}

The error’s message and stack are captured against the run and the run’s status becomes failed.

Returning a value from run() always succeeds the run — the platform doesn’t inspect what you returned. To distinguish “succeeded with a problem” from “failed,” throw for failures and return for everything else.

Examples

Use a deterministic idempotency key

Derive an idempotency key from data on event.body so the same input always produces the same key — duplicate calls across a retry will collapse on the server side:


async run(event: HealthEvent<IntakePayload>, step: HealthStep) {
  const idempotencyKey = `intake:${event.body.patientId}`;
 
  await step.do("Send notification", async () => {
    await fetch("https://api.example.com/send-notification", {
      method: "POST",
      headers: { "Idempotency-Key": idempotencyKey },
      body: JSON.stringify(event.body),
    });
  });
}

Fail fast on bad input, retry on transient errors

Combine the two patterns — throw for input you don’t accept, let the platform retry the things that might recover on their own:


async run(event: HealthEvent<IntakePayload>, step: HealthStep) {
  if (!event.body?.patientId) {
    throw new Error("patientId is required"); // run fails immediately
  }
 
  await step.do("Load profile", async () => {
    const res = await fetch(`https://api.example.com/patients/${event.body.patientId}`);
    if (!res.ok) {
      throw new Error(`profile lookup failed: ${res.status}`); // platform retries
    }
    return res.json();
  });
}