Pricing and billing - FlexInference

FlexInference is free to route. The only thing you ever pay us is a cut of the flex requests that actually save you money. Even then, you keep the bigger share.

What you pay

FlexInference charges 20% of the money saved on a flex request. A flex request is a call where you set start_within to a duration, a time budget you put in the request. Only a priced, non-Anthropic model can run one, which means the flex-capable OpenAI and Gemini models. start_within is a duration you put in the request, like 30 seconds. It tells FlexInference how long it may spend getting you a cheaper price before it must fall back to your standard model. It is a time budget you set, not a promise from us. Here is how a flex request works. You set a time budget. FlexInference tries a cheaper tier first. If that cheaper tier cannot finish inside your time budget, FlexInference falls back to your standard model so the request still completes. Flex goes cheap first and steps up to standard, never the other way around. The saving is the difference between the provider’s standard price for that call and the cheaper flex price you actually got. FlexInference takes 20% of that difference. You keep the other 80%. When a flex request saves nothing, you owe nothing.

FlexInference charges the fee on the saving, never on the tokens. You always come out ahead of the standard price. In the worst case the saving is zero, so the fee is zero too.

What is free

All default, priority, and auto routing, on any model. These proxy straight to the provider. FlexInference does not try a cheaper tier, so there is no saving to share and no fee.
All Anthropic (Claude) models. Claude is proxy-only. It has no flex tier, so a claude-* request is never a billable flex request. See models.

Only a priced flex request on an OpenAI or Gemini model ever costs you a fee.

Bring your own key (BYOK)

FlexInference routes with your provider keys. Your provider (OpenAI, Gemini, or Anthropic) bills you directly for the tokens, at their rates. FlexInference adds no markup on tokens. FlexInference never touches your token spend. The only charge from FlexInference is the 20% flex fee described above.

How your card is charged

Add or update a card on the dashboard Billing page. Your unpaid balance grows a little with each flex request that saves you money, and:

FlexInference charges your card automatically every month once your unpaid balance reaches $20.
A balance **under $20 rolls over** to the next month, and FlexInference charges it once it crosses$ 20.

Past-due and the 402

If a charge fails, or you owe a fee with no valid card on file, the organization goes past-due. While past-due:

Billable flex requests are paused and return a 402 payment_required.
Free routing (default, priority, auto) and all Anthropic models keep working. Only priced flex is affected.

A paused flex request returns 402 payment_required with a body that tells you what failed and how to clear it. See the 402 error for the exact JSON. To clear it, add or update a card on the dashboard Billing page, then retry.

Pricing details

Full pricing on the FlexInference marketing site.

The 402 error

How payment_required looks on the wire and how to handle it.

​What you pay

​What is free

​Bring your own key (BYOK)

​How your card is charged

​Past-due and the 402

Pricing details

The 402 error

What you pay

What is free

Bring your own key (BYOK)

How your card is charged

Past-due and the 402