401 | missing_api_key | No Authorization header, or it did not match the required Bearer <key> shape. Add Authorization: Bearer <your key>. |
401 | invalid_api_key | The key does not match FlexInference’s format (flex_live_ + 24 hex chars + a 6-hex checksum), or it is well-formed but unknown (revoked, regenerated, or mistyped). Copy the current key from the dashboard. |
429 | rate_limit_exceeded | Too many failed-authentication attempts from your client IP, or too many requests from your organization in the current window. Back off for the number of seconds in the Retry-After header (60s for the auth-flood limiter, 5s for the per-org limiter) before retrying. |
402 | payment_required | This organization has an overdue balance or no payment method on file, so billable flex requests are paused. Free routing (default, priority, auto) and all Anthropic models keep working. Add or update a card on the dashboard Billing page, then retry. This is the only code with type: billing_error. |
413 | request_too_large | The request body is over FlexInference’s 1,000,000-byte limit (checked against both the declared Content-Length and the bytes actually streamed). Large inline base64 files are the usual cause. Shrink the payload, or point to a hosted URL instead. |
400 | no_byok_key | No OpenAI key configured for your org (or the stored key failed to decrypt, which fails closed the same as missing). Add one in the dashboard under Settings -> API keys. |
400 | no_gemini_key | Same as no_byok_key, for a gemini-* model: no Gemini key configured for your org. Add one in the dashboard. |
400 | no_anthropic_key | Same as no_byok_key, for a claude-* model: no Anthropic key configured for your org. Add one in the dashboard. |
400 | missing_model | The model field is missing, empty, or not a string. Add a non-empty string, for example "model": "gpt-5.5". |
400 | missing_start_within | You did not send the required start_within field, and it has no default. start_within tells FlexInference how long you are willing to wait for the request to start. Set it to default, priority, auto, or a duration like HHh-MMm-SSs, for example 00h-00m-30s to wait up to thirty seconds. |
400 | invalid_start_within | start_within is not a string, doesn’t match one of the four accepted forms, or (for a duration) resolves outside the 5-second-to-600-second (10 minute) allowed window. Match the exact shape, and keep any duration inside 5s-600s. For example, use "start_within": "00h-00m-30s" for a thirty-second window, or "start_within": "default" to skip the flex race. |
400 | model_not_flex_capable | A duration start_within (the flex race) named a model outside that provider’s flex-capable allow-list (OpenAI and Gemini each keep their own list; Anthropic has none). Use a flex-capable model, or set start_within to default, priority, or auto to run this model without the race. |
400 | auto_unsupported_for_gemini | start_within: "auto" on a Gemini model. Gemini’s tiers are only flex, standard, and priority - there is no auto to resolve to. Use default, priority, or a duration instead. |
400 | flex_unsupported_for_anthropic | A duration start_within on a claude-* model. Anthropic has no flex tier at all (a provider limitation, not a model-specific one). Use default, priority, or auto instead. |
400 | missing_max_tokens | A claude-* request reached Anthropic with no token cap; Anthropic’s Messages API requires one and FlexInference will not synthesize a silent default. Set max_output_tokens (Responses), max_completion_tokens (Chat), or max_tokens (/v1/messages). |
400 | service_tier_not_allowed | A caller-supplied service_tier. FlexInference derives the tier from start_within and would otherwise silently overwrite your value, so it rejects the field instead. Remove service_tier and express the same intent through start_within. Applies on /v1/interactions and /v1/messages too. |
400 | unsupported_parameter | A parameter with no Responses-API equivalent on the models we serve was set. On Chat: any of presence_penalty, frequency_penalty, logit_bias, logprobs, top_logprobs, seed, stop, prediction, audio, non-text modalities, or web_search_options (use a Responses web_search tool instead). On /v1/messages: top_k, stop_sequences, cache_control blocks, or document/file content blocks with citations (all deferred). Remove the named field; error.param names exactly which one. See SDKs. |
400 | unsupported_generation_config | On /v1/interactions, generation_config.seed or generation_config.stop_sequences was set; neither has a Responses equivalent on the models we serve. Remove the named field. |
400 | unsupported_value | Chat Completions n was set to something other than 1. The underlying Responses API always produces a single generation, so n must be 1 or omitted. For example, send "n": 1 or leave n out. |
400 | invalid_json | The body isn’t syntactically valid JSON (JSON.parse failed). Fix the syntax - a trailing comma or unquoted key is a common cause. |
400 | invalid_body | The body parsed as JSON but the top-level value isn’t a JSON object (it’s an array, string, number, boolean, or null). Wrap your fields in a single {...} object. |
404 | unknown_url | Wrong path. Use /v1/responses, /v1/chat/completions, /v1/interactions, or /v1/messages. |
405 | method_not_allowed | The path exists, but every FlexInference endpoint only accepts POST. Send a POST with a JSON body instead. |
499 | client_closed_request | The client disconnected (closed tab, cancelled fetch, or a client-side timeout) before FlexInference finished responding. Not counted as a service failure. If you did not mean to cancel, raise your client timeout and let the request finish. A flex request can legitimately take up to 10 minutes. |
500 | internal_error | Something failed inside FlexInference outside of input validation - our bug, not your request. Safe to retry once; if it persists, contact support with the approximate timestamp and your org. |
502 | upstream_stream_failed | The upstream provider committed a response but its stream then ended with no completion event, or failed mid-stream with no error of its own to pass through - a transient upstream or network fault between FlexInference and the provider, not your request. Retry once; if it persists for one model the provider or that model may be degraded (try another model, or set a shorter start_within window so a slow attempt falls back to your standard tier sooner). Distinct from a verbatim upstream 5xx, which is passed through unchanged. |