default, priority, and auto requests. The models below can also run a flex request. You give start_within a duration, which is how long you are willing to wait for the request to start. FlexInference tries the cheaper flex tier first and runs it against that budget. If flex cannot start in time, your request falls back up to the standard tier and still completes, so a missed budget never costs you the answer. See how flex racing works for the full mechanics. Pass the model by its alias (for example gpt-5.5 or gemini-3.5-flash), not a dated snapshot.
Flex-capable models
OpenAI
GPT-5.5
gpt-5.5, gpt-5.5-proGPT-5.4
gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5.4-proGPT-5.2
gpt-5.2, gpt-5.2-proGPT-5 and 5.1
gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.1Reasoning
o3, o4-miniGemini
Gemini 3.5 / 3.1
gemini-3.5-flash, gemini-3.1-pro-preview, gemini-3.1-flash-liteGemini 3 preview
gemini-3-flash-previewGemini 2.5
gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-litestart_within on a model that is not on this list returns 400 model_not_flex_capable. The flex race only runs on the flex-capable models above. You have two ways to fix it. Pick one of the flex-capable models, or drop the duration and send the request as default, priority, or auto, which proxy any model straight to its provider. For Gemini, default maps to Gemini’s standard tier. Gemini has no auto tier, so auto on a Gemini model returns 400 auto_unsupported_for_gemini.
Anthropic (Claude)
Claude models route to Anthropic and are proxy-only. They supportdefault, priority, and auto, but not the flex race. A duration start_within on a claude-* model returns 400 flex_unsupported_for_anthropic, because Anthropic has no flex tier. default maps to Anthropic’s standard_only service tier, auto lets Anthropic pick, and priority is best-effort (mapped to auto, since Anthropic rejects a literal priority tier). Every claude-* request must set max_output_tokens (max_tokens on /v1/messages). If you leave it out, you get 400 missing_max_tokens. Set the field and resend.
Opus
claude-opus-4-8, claude-opus-4-7, claude-opus-4-6, claude-opus-4-5, claude-opus-4-1Sonnet
claude-sonnet-5, claude-sonnet-4-6, claude-sonnet-4-5Haiku
claude-haiku-4-5Fable
claude-fable-5/v1/responses, /v1/chat/completions, /v1/interactions, and the Anthropic-native /v1/messages. FlexInference translates each one to Anthropic’s Messages API. Anthropic decides the served tier, not FlexInference. A standard_only request comes back reported as "service_tier": "standard" on the response.
Flex errors
Each flex error tells you what happened, why, and the one change that fixes it. See the errors page for the full error shape and an example body.model_not_flex_capablemeans you sent a durationstart_withinto a model with no flex tier. Switch to a flex-capable model above, or drop the duration and usedefault,priority, orauto.auto_unsupported_for_geminimeans you asked forautoon a Gemini model, which has noautotier. Usedefaultfor Gemini’s standard tier.flex_unsupported_for_anthropicmeans you sent a durationstart_withinto aclaude-*model, which has no flex tier. Drop the duration and usedefault,priority, orauto.missing_max_tokensmeans aclaude-*request left outmax_output_tokens(max_tokenson/v1/messages). Set it and resend.
Inputs
These models accept the input types you’d expect from OpenAI:- Text: system, developer, user, and assistant messages.
- Images: pass image URLs or base64 data URLs. Base64 data URLs are the most reliable.
- Files: PDFs and other file inputs the Responses API accepts.
reasoning.effort maps to a per-model thinking_level and may be clamped to the range that model supports.
Text, streaming, structured outputs, function calling, image input, file input, and web search work on both OpenAI and Gemini. For web search, send a Responses
web_search tool; on Gemini we map it to google_search. Pass files as base64 data URLs (PDFs are the most reliable on Gemini).We add new flex-capable models from OpenAI and Gemini as the providers ship them. If a model you expect is missing, check that you are using its alias and that the provider offers a flex tier for it.