> ## Documentation Index
> Fetch the complete documentation index at: https://flexinference.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Errors

> The per-surface error envelopes, FlexInference codes, and how provider errors are normalized.

FlexInference shapes every error to match the endpoint you called, so the SDK you already use parses it. Call `/v1/responses` or `/v1/chat/completions` and you get OpenAI's envelope. Call `/v1/messages` and you get Anthropic's. Call `/v1/interactions` and you get Google's. This holds for both FlexInference's own errors and errors from the upstream provider, so one endpoint always returns one error shape.

On `/v1/responses` and `/v1/chat/completions`, the OpenAI envelope:

```json theme={null}
{
  "error": {
    "message": "The API key on this request was rejected - either it does not match FlexInference's key format (`flex_live_` followed by 24 hex characters and a 6-hex checksum), or it is well-formed but does not match any key on record. Copy the exact, current key from the FlexInference dashboard under Settings -> API keys, then send it as `Authorization: Bearer <key>`.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "param": null,
    "doc_url": "https://flexinference.mintlify.app/errors#invalid_api_key"
  }
}
```

The same error on `/v1/messages` (Anthropic's shape) and `/v1/interactions` (Google's shape):

```json theme={null}
{ "type": "error", "error": { "type": "authentication_error", "message": "..." } }
```

```json theme={null}
{ "error": { "code": 401, "message": "...", "status": "UNAUTHENTICATED" } }
```

The status code and message are the same across all three; only the wrapper changes. The full detail lives on the OpenAI envelope. There, `type` mirrors OpenAI's broad categories (`invalid_request_error`, `authentication_error`, `rate_limit_error`), plus two of our own: `flexinference_error` for internal faults and `billing_error` when a billing condition (like an overdue balance) pauses billable requests. `code` is the machine-readable reason. `param` names the offending field when there is one. `doc_url` links to that code's row below, or is `null` when unknown. The Anthropic shape carries an Anthropic `type` (mapped from the status) plus the message; the Google shape carries the numeric `code`, the message, and a canonical `status`. The FlexInference-specific `code` and `doc_url` appear only on the OpenAI envelope.

## FlexInference codes

These errors come from our own layer. Every message below tells you what was wrong, the exact rule that made it wrong, how to fix it, and shows an example where one helps. So you can branch on `error.code`, show `error.message` straight to a developer, and that is enough on its own.

Several of these errors involve `start_within`. Here is the one thing to know. FlexInference tries a cheaper flex tier first. If the cheaper tier cannot start your request in the time you allow, FlexInference sends the same request to your standard tier so it still finishes. The cheap tier runs first and your standard tier is the backup. This runs the opposite way from the common pattern, where you start on the strong model and drop down to a cheaper one only when it fails.

| Status | Code                                                                          | Meaning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| ------ | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `401`  | <span id="missing_api_key" /> `missing_api_key`                               | No `Authorization` header, or it did not match the required `Bearer <key>` shape. Add `Authorization: Bearer <your key>`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| `401`  | <span id="invalid_api_key" /> `invalid_api_key`                               | The key does not match FlexInference's format (`flex_live_` + 24 hex chars + a 6-hex checksum), or it is well-formed but unknown (revoked, regenerated, or mistyped). Copy the current key from the dashboard.                                                                                                                                                                                                                                                                                                                                                                                         |
| `429`  | <span id="rate_limit_exceeded" /> `rate_limit_exceeded`                       | Too many failed-authentication attempts from your client IP, or too many requests from your organization in the current window. Back off for the number of seconds in the `Retry-After` header (60s for the auth-flood limiter, 5s for the per-org limiter) before retrying.                                                                                                                                                                                                                                                                                                                           |
| `402`  | <span id="payment_required" /> `payment_required`                             | This organization has an overdue balance or no payment method on file, so billable flex requests are paused. Free routing (`default`, `priority`, `auto`) and all Anthropic models keep working. Add or update a card on the dashboard [Billing](/billing) page, then retry. This is the only code with `type: billing_error`.                                                                                                                                                                                                                                                                         |
| `413`  | <span id="request_too_large" /> `request_too_large`                           | The request body is over FlexInference's 1,000,000-byte limit (checked against both the declared `Content-Length` and the bytes actually streamed). Large inline base64 files are the usual cause. Shrink the payload, or point to a hosted URL instead.                                                                                                                                                                                                                                                                                                                                               |
| `400`  | <span id="no_byok_key" /> `no_byok_key`                                       | No OpenAI key configured for your org (or the stored key failed to decrypt, which fails closed the same as missing). Add one in the dashboard under Settings -> API keys.                                                                                                                                                                                                                                                                                                                                                                                                                              |
| `400`  | <span id="no_gemini_key" /> `no_gemini_key`                                   | Same as `no_byok_key`, for a `gemini-*` model: no Gemini key configured for your org. Add one in the dashboard.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `400`  | <span id="no_anthropic_key" /> `no_anthropic_key`                             | Same as `no_byok_key`, for a `claude-*` model: no Anthropic key configured for your org. Add one in the dashboard.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `400`  | <span id="missing_model" /> `missing_model`                                   | The `model` field is missing, empty, or not a string. Add a non-empty string, for example `"model": "gpt-5.5"`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `400`  | <span id="missing_start_within" /> `missing_start_within`                     | You did not send the required `start_within` field, and it has no default. `start_within` tells FlexInference how long you are willing to wait for the request to start. Set it to `default`, `priority`, `auto`, or a duration like `HHh-MMm-SSs`, for example `00h-00m-30s` to wait up to thirty seconds.                                                                                                                                                                                                                                                                                            |
| `400`  | <span id="invalid_start_within" /> `invalid_start_within`                     | `start_within` is not a string, doesn't match one of the four accepted forms, or (for a duration) resolves outside the 5-second-to-600-second (10 minute) allowed window. Match the exact shape, and keep any duration inside 5s-600s. For example, use `"start_within": "00h-00m-30s"` for a thirty-second window, or `"start_within": "default"` to skip the flex race.                                                                                                                                                                                                                              |
| `400`  | <span id="model_not_flex_capable" /> `model_not_flex_capable`                 | A duration `start_within` (the flex race) named a model outside that provider's flex-capable allow-list (OpenAI and Gemini each keep their own list; Anthropic has none). Use a flex-capable [model](/models), or set `start_within` to `default`, `priority`, or `auto` to run this model without the race.                                                                                                                                                                                                                                                                                           |
| `400`  | <span id="auto_unsupported_for_gemini" /> `auto_unsupported_for_gemini`       | `start_within: "auto"` on a Gemini model. Gemini's tiers are only `flex`, `standard`, and `priority` - there is no `auto` to resolve to. Use `default`, `priority`, or a duration instead.                                                                                                                                                                                                                                                                                                                                                                                                             |
| `400`  | <span id="flex_unsupported_for_anthropic" /> `flex_unsupported_for_anthropic` | A duration `start_within` on a `claude-*` model. Anthropic has no flex tier at all (a provider limitation, not a model-specific one). Use `default`, `priority`, or `auto` instead.                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `400`  | <span id="missing_max_tokens" /> `missing_max_tokens`                         | A `claude-*` request reached Anthropic with no token cap; Anthropic's Messages API requires one and FlexInference will not synthesize a silent default. Set `max_output_tokens` (Responses), `max_completion_tokens` (Chat), or `max_tokens` (`/v1/messages`).                                                                                                                                                                                                                                                                                                                                         |
| `400`  | <span id="service_tier_not_allowed" /> `service_tier_not_allowed`             | A caller-supplied `service_tier`. FlexInference derives the tier from `start_within` and would otherwise silently overwrite your value, so it rejects the field instead. Remove `service_tier` and express the same intent through `start_within`. Applies on `/v1/interactions` and `/v1/messages` too.                                                                                                                                                                                                                                                                                               |
| `400`  | <span id="unsupported_parameter" /> `unsupported_parameter`                   | A parameter with no Responses-API equivalent on the models we serve was set. On Chat: any of `presence_penalty`, `frequency_penalty`, `logit_bias`, `logprobs`, `top_logprobs`, `seed`, `stop`, `prediction`, `audio`, non-text `modalities`, or `web_search_options` (use a Responses `web_search` tool instead). On `/v1/messages`: `top_k`, `stop_sequences`, `cache_control` blocks, or `document`/`file` content blocks with `citations` (all deferred). Remove the named field; `error.param` names exactly which one. See [SDKs](/sdks).                                                        |
| `400`  | <span id="unsupported_generation_config" /> `unsupported_generation_config`   | On `/v1/interactions`, `generation_config.seed` or `generation_config.stop_sequences` was set; neither has a Responses equivalent on the models we serve. Remove the named field.                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `400`  | <span id="unsupported_value" /> `unsupported_value`                           | Chat Completions `n` was set to something other than `1`. The underlying Responses API always produces a single generation, so `n` must be `1` or omitted. For example, send `"n": 1` or leave `n` out.                                                                                                                                                                                                                                                                                                                                                                                                |
| `400`  | <span id="invalid_json" /> `invalid_json`                                     | The body isn't syntactically valid JSON (`JSON.parse` failed). Fix the syntax - a trailing comma or unquoted key is a common cause.                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `400`  | <span id="invalid_body" /> `invalid_body`                                     | The body parsed as JSON but the top-level value isn't a JSON object (it's an array, string, number, boolean, or null). Wrap your fields in a single `{...}` object.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `404`  | <span id="unknown_url" /> `unknown_url`                                       | Wrong path. Use `/v1/responses`, `/v1/chat/completions`, `/v1/interactions`, or `/v1/messages`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `405`  | <span id="method_not_allowed" /> `method_not_allowed`                         | The path exists, but every FlexInference endpoint only accepts `POST`. Send a `POST` with a JSON body instead.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `499`  | <span id="client_closed_request" /> `client_closed_request`                   | The client disconnected (closed tab, cancelled fetch, or a client-side timeout) before FlexInference finished responding. Not counted as a service failure. If you did not mean to cancel, raise your client timeout and let the request finish. A flex request can legitimately take up to 10 minutes.                                                                                                                                                                                                                                                                                                |
| `500`  | <span id="internal_error" /> `internal_error`                                 | Something failed inside FlexInference outside of input validation - our bug, not your request. Safe to retry once; if it persists, contact support with the approximate timestamp and your org.                                                                                                                                                                                                                                                                                                                                                                                                        |
| `502`  | <span id="upstream_stream_failed" /> `upstream_stream_failed`                 | The upstream provider committed a response but its stream then ended with no completion event, or failed mid-stream with no error of its own to pass through - a transient upstream or network fault between FlexInference and the provider, not your request. Retry once; if it persists for one model the provider or that model may be degraded (try another model, or set a shorter `start_within` window so a slow attempt falls back to your standard tier sooner). Distinct from an upstream `5xx` the provider itself returned, which is reshaped to your surface with the provider's message. |

## Provider errors

An error from the upstream provider (a `400` for a bad parameter, a `401` for an invalid provider key, a `429` rate limit on the standard attempt) is normalized the same way as our own. FlexInference keeps the provider's status and message and re-emits them in your surface's envelope. You never receive the provider's raw body, so you do not have to special-case three vendors' error shapes: call `/v1/messages` and even an OpenAI-model error comes back Anthropic-shaped. The FlexInference-specific `code` and `doc_url` are on the OpenAI envelope only; a provider error carries the provider's message with the surface's `type` or `status`.

<Tip>
  On the OpenAI surface, branch on `error.code` for FlexInference-specific handling (for example, prompt the user to add a BYOK key on `no_byok_key`), and treat unknown codes as ordinary OpenAI errors.
</Tip>

## Handling errors in code

The OpenAI SDKs raise on non-2xx responses, with the parsed body attached.

<CodeGroup>
  ```python Python theme={null}
  from openai import APIStatusError

  try:
      resp = client.responses.create(
          model="gpt-5.5",
          input="hi",
          extra_body={"start_within": "00h-00m-30s"},
      )
  except APIStatusError as e:
      code = (e.body or {}).get("error", {}).get("code")
      if code == "no_byok_key":
          print("Add your OpenAI key in the dashboard.")
      else:
          raise
  ```

  ```typescript Node theme={null}
  import OpenAI from "openai";

  try {
    await client.responses.create({ model: "gpt-5.5", input: "hi", start_within: "00h-00m-30s" } as any);
  } catch (e) {
    if (e instanceof OpenAI.APIError && e.code === "no_byok_key") {
      console.log("Add your OpenAI key in the dashboard.");
    } else {
      throw e;
    }
  }
  ```
</CodeGroup>