Errors and retries
Errors happen — connectors time out, HTTP calls return 500s, JSON parsing throws. This page covers what the platform records, what it retries on its own, and what’s available to you inside run().
What gets recorded when a step throws
For every step.do callback that throws:
- The step is marked failed in the run timeline, with its error message and stack.
- A line is appended to the run’s event log at level
error. - If the error propagates out of
run()(i.e. you don’t catch it), the run itself is marked failed and the error message becomes part of the run’s output.
You don’t need to log the error explicitly — the platform captures it from the throw.
Automatic retries
The platform retries transient step failures automatically before giving up. Retries happen at the step level: a step that already succeeded is never re-run on retry, only the failed (and not-yet-attempted) steps are.
A few consequences worth knowing:
- Side effects must tolerate repetition. A step that posts to an external API might post twice across a retry. If the receiving system can’t deduplicate, include an idempotency key (e.g. the
runId) in the request so duplicate calls collapse on the server side. try/catchopts out of platform retry. Catching an error tells the platform “I’ve handled this” — the step is recorded as failed but the run continues. If you’d rather let the platform retry, don’t catch.
Tuning retry counts and backoff per step isn’t exposed today; you get the platform defaults.
Handle a failure and keep going
To recover from a step failure inside the loop, wrap the step in try/catch:
let scan;
try {
await step.do("Generate a health scan", async ({ ctx }) => {
await ctx.FACIAL_SCAN.create({ expiresIn: 3600 });
});
scan = await step.waitForScan({ timeout: "30 minutes" });
} catch (error) {
console.error("scan failed, falling back to manual review", error);
scan = { status: "fallback" };
}
// Loop continues with `scan` setWhat this produces:
- The step is still recorded as failed in the timeline (the failure happened — the timeline doesn’t lie).
- The run itself does not fail; you absorbed the error.
- The fallback path inside
catchruns normally and counts toward the rest of the timeline.
Fail a run on purpose
To deliberately fail a run — for example, on an unrecoverable input validation error — throw from run():
async run(event: HealthEvent, step: HealthStep) {
if (!event.body?.patientId) {
throw new Error("patientId is required");
}
// …
}The error’s message and stack are captured against the run and the run’s status becomes failed.
Returning a value from run() always succeeds the run — the platform doesn’t inspect what you returned. To distinguish “succeeded with a problem” from “failed,” throw for failures and return for everything else.
Examples
Use a deterministic idempotency key
Derive an idempotency key from data on event.body so the same input always produces the same key — duplicate calls across a retry will collapse on the server side:
async run(event: HealthEvent<IntakePayload>, step: HealthStep) {
const idempotencyKey = `intake:${event.body.patientId}`;
await step.do("Send notification", async () => {
await fetch("https://api.example.com/send-notification", {
method: "POST",
headers: { "Idempotency-Key": idempotencyKey },
body: JSON.stringify(event.body),
});
});
}Fail fast on bad input, retry on transient errors
Combine the two patterns — throw for input you don’t accept, let the platform retry the things that might recover on their own:
async run(event: HealthEvent<IntakePayload>, step: HealthStep) {
if (!event.body?.patientId) {
throw new Error("patientId is required"); // run fails immediately
}
await step.do("Load profile", async () => {
const res = await fetch(`https://api.example.com/patients/${event.body.patientId}`);
if (!res.ok) {
throw new Error(`profile lookup failed: ${res.status}`); // platform retries
}
return res.json();
});
}