Skip to main content
FlexInference routes to OpenAI, Google Gemini, and Anthropic, and proxies any model for default, priority, and auto requests. The models below can also run a flex request. You give start_within a duration, which is how long you are willing to wait for the request to start. FlexInference tries the cheaper flex tier first and runs it against that budget. If flex cannot start in time, your request falls back up to the standard tier and still completes, so a missed budget never costs you the answer. See how flex racing works for the full mechanics. Pass the model by its alias (for example gpt-5.5 or gemini-3.5-flash), not a dated snapshot.

Flex-capable models

OpenAI

GPT-5.5

gpt-5.5, gpt-5.5-pro

GPT-5.4

gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5.4-pro

GPT-5.2

gpt-5.2, gpt-5.2-pro

GPT-5 and 5.1

gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.1

Reasoning

o3, o4-mini

Gemini

Gemini 3.5 / 3.1

gemini-3.5-flash, gemini-3.1-pro-preview, gemini-3.1-flash-lite

Gemini 3 preview

gemini-3-flash-preview

Gemini 2.5

gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
A duration start_within on a model that is not on this list returns 400 model_not_flex_capable. The flex race only runs on the flex-capable models above. You have two ways to fix it. Pick one of the flex-capable models, or drop the duration and send the request as default, priority, or auto, which proxy any model straight to its provider. For Gemini, default maps to Gemini’s standard tier. Gemini has no auto tier, so auto on a Gemini model returns 400 auto_unsupported_for_gemini.

Anthropic (Claude)

Claude models route to Anthropic and are proxy-only. They support default, priority, and auto, but not the flex race. A duration start_within on a claude-* model returns 400 flex_unsupported_for_anthropic, because Anthropic has no flex tier. default maps to Anthropic’s standard_only service tier, auto lets Anthropic pick, and priority is best-effort (mapped to auto, since Anthropic rejects a literal priority tier). Every claude-* request must set max_output_tokens (max_tokens on /v1/messages). If you leave it out, you get 400 missing_max_tokens. Set the field and resend.

Opus

claude-opus-4-8, claude-opus-4-7, claude-opus-4-6, claude-opus-4-5, claude-opus-4-1

Sonnet

claude-sonnet-5, claude-sonnet-4-6, claude-sonnet-4-5

Haiku

claude-haiku-4-5

Fable

claude-fable-5
You can reach Claude through any caller format. That includes /v1/responses, /v1/chat/completions, /v1/interactions, and the Anthropic-native /v1/messages. FlexInference translates each one to Anthropic’s Messages API. Anthropic decides the served tier, not FlexInference. A standard_only request comes back reported as "service_tier": "standard" on the response.

Flex errors

Each flex error tells you what happened, why, and the one change that fixes it. See the errors page for the full error shape and an example body.
  • model_not_flex_capable means you sent a duration start_within to a model with no flex tier. Switch to a flex-capable model above, or drop the duration and use default, priority, or auto.
  • auto_unsupported_for_gemini means you asked for auto on a Gemini model, which has no auto tier. Use default for Gemini’s standard tier.
  • flex_unsupported_for_anthropic means you sent a duration start_within to a claude-* model, which has no flex tier. Drop the duration and use default, priority, or auto.
  • missing_max_tokens means a claude-* request left out max_output_tokens (max_tokens on /v1/messages). Set it and resend.

Inputs

These models accept the input types you’d expect from OpenAI:
  • Text: system, developer, user, and assistant messages.
  • Images: pass image URLs or base64 data URLs. Base64 data URLs are the most reliable.
  • Files: PDFs and other file inputs the Responses API accepts.
Output is text (including tool calls, structured outputs, and reasoning). Streaming works on every model. On Gemini, reasoning.effort maps to a per-model thinking_level and may be clamped to the range that model supports.
Text, streaming, structured outputs, function calling, image input, file input, and web search work on both OpenAI and Gemini. For web search, send a Responses web_search tool; on Gemini we map it to google_search. Pass files as base64 data URLs (PDFs are the most reliable on Gemini).
We add new flex-capable models from OpenAI and Gemini as the providers ship them. If a model you expect is missing, check that you are using its alias and that the provider offers a flex tier for it.