Skip to main content
FlexInference uses OpenAI’s error envelope, so any OpenAI client parses our errors without changes:
{
  "error": {
    "message": "The API key on this request was rejected - either it does not match FlexInference's key format (`flex_live_` followed by 24 hex characters and a 6-hex checksum), or it is well-formed but does not match any key on record. Copy the exact, current key from the FlexInference dashboard under Settings -> API keys, then send it as `Authorization: Bearer <key>`.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "param": null,
    "doc_url": "https://flexinference.mintlify.app/errors#invalid_api_key"
  }
}
type mirrors OpenAI’s broad categories (invalid_request_error, authentication_error, rate_limit_error), plus two of our own: flexinference_error for internal faults and billing_error when a billing condition (like an overdue balance) pauses billable requests. code is the machine-readable reason. param names the offending field when there is one. doc_url is an extra field we added, and clients that only read message, type, code, and param are not affected by it. When FlexInference knows the code, doc_url links straight to that code’s row below. When it does not, doc_url is null.

FlexInference codes

These errors come from our own layer. Every message below tells you what was wrong, the exact rule that made it wrong, how to fix it, and shows an example where one helps. So you can branch on error.code, show error.message straight to a developer, and that is enough on its own. Several of these errors involve start_within. Here is the one thing to know. FlexInference tries a cheaper flex tier first. If the cheaper tier cannot start your request in the time you allow, FlexInference sends the same request to your standard tier so it still finishes. The cheap tier runs first and your standard tier is the backup. This runs the opposite way from the common pattern, where you start on the strong model and drop down to a cheaper one only when it fails.
StatusCodeMeaning
401 missing_api_keyNo Authorization header, or it did not match the required Bearer <key> shape. Add Authorization: Bearer <your key>.
401 invalid_api_keyThe key does not match FlexInference’s format (flex_live_ + 24 hex chars + a 6-hex checksum), or it is well-formed but unknown (revoked, regenerated, or mistyped). Copy the current key from the dashboard.
429 rate_limit_exceededToo many failed-authentication attempts from your client IP, or too many requests from your organization in the current window. Back off for the number of seconds in the Retry-After header (60s for the auth-flood limiter, 5s for the per-org limiter) before retrying.
402 payment_requiredThis organization has an overdue balance or no payment method on file, so billable flex requests are paused. Free routing (default, priority, auto) and all Anthropic models keep working. Add or update a card on the dashboard Billing page, then retry. This is the only code with type: billing_error.
413 request_too_largeThe request body is over FlexInference’s 1,000,000-byte limit (checked against both the declared Content-Length and the bytes actually streamed). Large inline base64 files are the usual cause. Shrink the payload, or point to a hosted URL instead.
400 no_byok_keyNo OpenAI key configured for your org (or the stored key failed to decrypt, which fails closed the same as missing). Add one in the dashboard under Settings -> API keys.
400 no_gemini_keySame as no_byok_key, for a gemini-* model: no Gemini key configured for your org. Add one in the dashboard.
400 no_anthropic_keySame as no_byok_key, for a claude-* model: no Anthropic key configured for your org. Add one in the dashboard.
400 missing_modelThe model field is missing, empty, or not a string. Add a non-empty string, for example "model": "gpt-5.5".
400 missing_start_withinYou did not send the required start_within field, and it has no default. start_within tells FlexInference how long you are willing to wait for the request to start. Set it to default, priority, auto, or a duration like HHh-MMm-SSs, for example 00h-00m-30s to wait up to thirty seconds.
400 invalid_start_withinstart_within is not a string, doesn’t match one of the four accepted forms, or (for a duration) resolves outside the 5-second-to-600-second (10 minute) allowed window. Match the exact shape, and keep any duration inside 5s-600s. For example, use "start_within": "00h-00m-30s" for a thirty-second window, or "start_within": "default" to skip the flex race.
400 model_not_flex_capableA duration start_within (the flex race) named a model outside that provider’s flex-capable allow-list (OpenAI and Gemini each keep their own list; Anthropic has none). Use a flex-capable model, or set start_within to default, priority, or auto to run this model without the race.
400 auto_unsupported_for_geministart_within: "auto" on a Gemini model. Gemini’s tiers are only flex, standard, and priority - there is no auto to resolve to. Use default, priority, or a duration instead.
400 flex_unsupported_for_anthropicA duration start_within on a claude-* model. Anthropic has no flex tier at all (a provider limitation, not a model-specific one). Use default, priority, or auto instead.
400 missing_max_tokensA claude-* request reached Anthropic with no token cap; Anthropic’s Messages API requires one and FlexInference will not synthesize a silent default. Set max_output_tokens (Responses), max_completion_tokens (Chat), or max_tokens (/v1/messages).
400 service_tier_not_allowedA caller-supplied service_tier. FlexInference derives the tier from start_within and would otherwise silently overwrite your value, so it rejects the field instead. Remove service_tier and express the same intent through start_within. Applies on /v1/interactions and /v1/messages too.
400 unsupported_parameterA parameter with no Responses-API equivalent on the models we serve was set. On Chat: any of presence_penalty, frequency_penalty, logit_bias, logprobs, top_logprobs, seed, stop, prediction, audio, non-text modalities, or web_search_options (use a Responses web_search tool instead). On /v1/messages: top_k, stop_sequences, cache_control blocks, or document/file content blocks with citations (all deferred). Remove the named field; error.param names exactly which one. See SDKs.
400 unsupported_generation_configOn /v1/interactions, generation_config.seed or generation_config.stop_sequences was set; neither has a Responses equivalent on the models we serve. Remove the named field.
400 unsupported_valueChat Completions n was set to something other than 1. The underlying Responses API always produces a single generation, so n must be 1 or omitted. For example, send "n": 1 or leave n out.
400 invalid_jsonThe body isn’t syntactically valid JSON (JSON.parse failed). Fix the syntax - a trailing comma or unquoted key is a common cause.
400 invalid_bodyThe body parsed as JSON but the top-level value isn’t a JSON object (it’s an array, string, number, boolean, or null). Wrap your fields in a single {...} object.
404 unknown_urlWrong path. Use /v1/responses, /v1/chat/completions, /v1/interactions, or /v1/messages.
405 method_not_allowedThe path exists, but every FlexInference endpoint only accepts POST. Send a POST with a JSON body instead.
499 client_closed_requestThe client disconnected (closed tab, cancelled fetch, or a client-side timeout) before FlexInference finished responding. Not counted as a service failure. If you did not mean to cancel, raise your client timeout and let the request finish. A flex request can legitimately take up to 10 minutes.
500 internal_errorSomething failed inside FlexInference outside of input validation - our bug, not your request. Safe to retry once; if it persists, contact support with the approximate timestamp and your org.
502 upstream_stream_failedThe upstream provider committed a response but its stream then ended with no completion event, or failed mid-stream with no error of its own to pass through - a transient upstream or network fault between FlexInference and the provider, not your request. Retry once; if it persists for one model the provider or that model may be degraded (try another model, or set a shorter start_within window so a slow attempt falls back to your standard tier sooner). Distinct from a verbatim upstream 5xx, which is passed through unchanged.

Pass-through errors

Anything that is not a FlexInference-layer error is the provider’s own response, returned to you unchanged. You get the same status and the same body the provider sent. A 400 for a bad parameter, a 401 for an invalid provider key, and a 429 rate limit on the standard attempt all reach you exactly as the provider wrote them. FlexInference never rewrites these errors. A pass-through error keeps whatever doc_url the provider sent, and FlexInference does not add one.
Branch on error.code for FlexInference-specific handling (for example, prompt the user to add a BYOK key on no_byok_key), and treat unknown codes as ordinary OpenAI errors.

Handling errors in code

The OpenAI SDKs raise on non-2xx responses, with the parsed body attached.
from openai import APIStatusError

try:
    resp = client.responses.create(
        model="gpt-5.5",
        input="hi",
        extra_body={"start_within": "00h-00m-30s"},
    )
except APIStatusError as e:
    code = (e.body or {}).get("error", {}).get("code")
    if code == "no_byok_key":
        print("Add your OpenAI key in the dashboard.")
    else:
        raise