Errors - FlexInference

FlexInference uses OpenAI’s error envelope, so any OpenAI client parses our errors without changes:

{
  "error": {
    "message": "The API key on this request was rejected - either it does not match FlexInference's key format (`flex_live_` followed by 24 hex characters and a 6-hex checksum), or it is well-formed but does not match any key on record. Copy the exact, current key from the FlexInference dashboard under Settings -> API keys, then send it as `Authorization: Bearer <key>`.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "param": null,
    "doc_url": "https://flexinference.mintlify.app/errors#invalid_api_key"
  }
}

type mirrors OpenAI’s broad categories (invalid_request_error, authentication_error, rate_limit_error), plus two of our own: flexinference_error for internal faults and billing_error when a billing condition (like an overdue balance) pauses billable requests. code is the machine-readable reason. param names the offending field when there is one. doc_url is an extra field we added, and clients that only read message, type, code, and param are not affected by it. When FlexInference knows the code, doc_url links straight to that code’s row below. When it does not, doc_url is null.

FlexInference codes

These errors come from our own layer. Every message below tells you what was wrong, the exact rule that made it wrong, how to fix it, and shows an example where one helps. So you can branch on error.code, show error.message straight to a developer, and that is enough on its own. Several of these errors involve start_within. Here is the one thing to know. FlexInference tries a cheaper flex tier first. If the cheaper tier cannot start your request in the time you allow, FlexInference sends the same request to your standard tier so it still finishes. The cheap tier runs first and your standard tier is the backup. This runs the opposite way from the common pattern, where you start on the strong model and drop down to a cheaper one only when it fails.

Status	Code	Meaning
`401`	`missing_api_key`	No `Authorization` header, or it did not match the required `Bearer <key>` shape. Add `Authorization: Bearer <your key>`.
`401`	`invalid_api_key`	The key does not match FlexInference’s format (`flex_live_` + 24 hex chars + a 6-hex checksum), or it is well-formed but unknown (revoked, regenerated, or mistyped). Copy the current key from the dashboard.
`429`	`rate_limit_exceeded`	Too many failed-authentication attempts from your client IP, or too many requests from your organization in the current window. Back off for the number of seconds in the `Retry-After` header (60s for the auth-flood limiter, 5s for the per-org limiter) before retrying.
`402`	`payment_required`	This organization has an overdue balance or no payment method on file, so billable flex requests are paused. Free routing (`default`, `priority`, `auto`) and all Anthropic models keep working. Add or update a card on the dashboard Billing page, then retry. This is the only code with `type: billing_error`.
`413`	`request_too_large`	The request body is over FlexInference’s 1,000,000-byte limit (checked against both the declared `Content-Length` and the bytes actually streamed). Large inline base64 files are the usual cause. Shrink the payload, or point to a hosted URL instead.
`400`	`no_byok_key`	No OpenAI key configured for your org (or the stored key failed to decrypt, which fails closed the same as missing). Add one in the dashboard under Settings -> API keys.
`400`	`no_gemini_key`	Same as `no_byok_key`, for a `gemini-*` model: no Gemini key configured for your org. Add one in the dashboard.
`400`	`no_anthropic_key`	Same as `no_byok_key`, for a `claude-*` model: no Anthropic key configured for your org. Add one in the dashboard.
`400`	`missing_model`	The `model` field is missing, empty, or not a string. Add a non-empty string, for example `"model": "gpt-5.5"`.
`400`	`missing_start_within`	You did not send the required `start_within` field, and it has no default. `start_within` tells FlexInference how long you are willing to wait for the request to start. Set it to `default`, `priority`, `auto`, or a duration like `HHh-MMm-SSs`, for example `00h-00m-30s` to wait up to thirty seconds.
`400`	`invalid_start_within`	`start_within` is not a string, doesn’t match one of the four accepted forms, or (for a duration) resolves outside the 5-second-to-600-second (10 minute) allowed window. Match the exact shape, and keep any duration inside 5s-600s. For example, use `"start_within": "00h-00m-30s"` for a thirty-second window, or `"start_within": "default"` to skip the flex race.
`400`	`model_not_flex_capable`	A duration `start_within` (the flex race) named a model outside that provider’s flex-capable allow-list (OpenAI and Gemini each keep their own list; Anthropic has none). Use a flex-capable model, or set `start_within` to `default`, `priority`, or `auto` to run this model without the race.
`400`	`auto_unsupported_for_gemini`	`start_within: "auto"` on a Gemini model. Gemini’s tiers are only `flex`, `standard`, and `priority` - there is no `auto` to resolve to. Use `default`, `priority`, or a duration instead.
`400`	`flex_unsupported_for_anthropic`	A duration `start_within` on a `claude-*` model. Anthropic has no flex tier at all (a provider limitation, not a model-specific one). Use `default`, `priority`, or `auto` instead.
`400`	`missing_max_tokens`	A `claude-*` request reached Anthropic with no token cap; Anthropic’s Messages API requires one and FlexInference will not synthesize a silent default. Set `max_output_tokens` (Responses), `max_completion_tokens` (Chat), or `max_tokens` (`/v1/messages`).
`400`	`service_tier_not_allowed`	A caller-supplied `service_tier`. FlexInference derives the tier from `start_within` and would otherwise silently overwrite your value, so it rejects the field instead. Remove `service_tier` and express the same intent through `start_within`. Applies on `/v1/interactions` and `/v1/messages` too.
`400`	`unsupported_parameter`	A parameter with no Responses-API equivalent on the models we serve was set. On Chat: any of `presence_penalty`, `frequency_penalty`, `logit_bias`, `logprobs`, `top_logprobs`, `seed`, `stop`, `prediction`, `audio`, non-text `modalities`, or `web_search_options` (use a Responses `web_search` tool instead). On `/v1/messages`: `top_k`, `stop_sequences`, `cache_control` blocks, or `document`/`file` content blocks with `citations` (all deferred). Remove the named field; `error.param` names exactly which one. See SDKs.
`400`	`unsupported_generation_config`	On `/v1/interactions`, `generation_config.seed` or `generation_config.stop_sequences` was set; neither has a Responses equivalent on the models we serve. Remove the named field.
`400`	`unsupported_value`	Chat Completions `n` was set to something other than `1`. The underlying Responses API always produces a single generation, so `n` must be `1` or omitted. For example, send `"n": 1` or leave `n` out.
`400`	`invalid_json`	The body isn’t syntactically valid JSON (`JSON.parse` failed). Fix the syntax - a trailing comma or unquoted key is a common cause.
`400`	`invalid_body`	The body parsed as JSON but the top-level value isn’t a JSON object (it’s an array, string, number, boolean, or null). Wrap your fields in a single `{...}` object.
`404`	`unknown_url`	Wrong path. Use `/v1/responses`, `/v1/chat/completions`, `/v1/interactions`, or `/v1/messages`.
`405`	`method_not_allowed`	The path exists, but every FlexInference endpoint only accepts `POST`. Send a `POST` with a JSON body instead.
`499`	`client_closed_request`	The client disconnected (closed tab, cancelled fetch, or a client-side timeout) before FlexInference finished responding. Not counted as a service failure. If you did not mean to cancel, raise your client timeout and let the request finish. A flex request can legitimately take up to 10 minutes.
`500`	`internal_error`	Something failed inside FlexInference outside of input validation - our bug, not your request. Safe to retry once; if it persists, contact support with the approximate timestamp and your org.
`502`	`upstream_stream_failed`	The upstream provider committed a response but its stream then ended with no completion event, or failed mid-stream with no error of its own to pass through - a transient upstream or network fault between FlexInference and the provider, not your request. Retry once; if it persists for one model the provider or that model may be degraded (try another model, or set a shorter `start_within` window so a slow attempt falls back to your standard tier sooner). Distinct from a verbatim upstream `5xx`, which is passed through unchanged.

Pass-through errors

Anything that is not a FlexInference-layer error is the provider’s own response, returned to you unchanged. You get the same status and the same body the provider sent. A 400 for a bad parameter, a 401 for an invalid provider key, and a 429 rate limit on the standard attempt all reach you exactly as the provider wrote them. FlexInference never rewrites these errors. A pass-through error keeps whatever doc_url the provider sent, and FlexInference does not add one.

Branch on error.code for FlexInference-specific handling (for example, prompt the user to add a BYOK key on no_byok_key), and treat unknown codes as ordinary OpenAI errors.

Handling errors in code

The OpenAI SDKs raise on non-2xx responses, with the parsed body attached.

from openai import APIStatusError

try:
    resp = client.responses.create(
        model="gpt-5.5",
        input="hi",
        extra_body={"start_within": "00h-00m-30s"},
    )
except APIStatusError as e:
    code = (e.body or {}).get("error", {}).get("code")
    if code == "no_byok_key":
        print("Add your OpenAI key in the dashboard.")
    else:
        raise

​FlexInference codes

​Pass-through errors

​Handling errors in code

FlexInference codes

Pass-through errors

Handling errors in code