> ## Documentation Index
> Fetch the complete documentation index at: https://flexinference.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Pricing and billing

> What FlexInference charges, how BYOK works, and when your card is charged.

FlexInference is **free to route**. The only thing you ever pay us is a cut of the flex requests that actually save you money. Even then, you keep the bigger share.

## What you pay

FlexInference charges **20% of the money saved** on a flex request. A flex request is a call where you set [`start_within`](/deadline-routing) to a duration, a time budget you put in the request. Only a **priced, non-Anthropic model** can run one, which means the flex-capable OpenAI and Gemini [models](/models).

`start_within` is a duration you put in the request, like 30 seconds. It tells FlexInference how long it may spend getting you a cheaper price before it must fall back to your standard model. It is a time budget you set, not a promise from us.

Here is how a flex request works. You set a time budget. FlexInference tries a cheaper tier first. If that cheaper tier cannot finish inside your time budget, FlexInference falls back to your standard model so the request still completes. Flex goes cheap first and steps up to standard, never the other way around.

The saving is the difference between the provider's standard price for that call and the cheaper flex price you actually got. FlexInference takes 20% of that difference. You keep the other 80%. When a flex request saves nothing, you owe nothing.

<Note>
  FlexInference charges the fee on the **saving**, never on the tokens. You always come out ahead of the standard price. In the worst case the saving is zero, so the fee is zero too.
</Note>

## What is free

* **All `default`, `priority`, and `auto` routing**, on any model. These proxy straight to the provider. FlexInference does not try a cheaper tier, so there is no saving to share and no fee.
* **All Anthropic (Claude) models.** Claude is proxy-only. It has no flex tier, so a `claude-*` request is never a billable flex request. See [models](/models).

Only a **priced flex request on an OpenAI or Gemini model** ever costs you a fee.

## Bring your own key (BYOK)

FlexInference routes with **your** provider keys. Your provider (OpenAI, Gemini, or Anthropic) bills you **directly** for the tokens, at their rates. FlexInference adds **no markup** on tokens. FlexInference never touches your token spend. The only charge from FlexInference is the 20% flex fee described above.

## How your card is charged

Add or update a card on the dashboard **Billing** page. Your unpaid balance grows a little with each flex request that saves you money, and:

* FlexInference charges your card **automatically every month** once your unpaid balance reaches **\$20**.
* A balance \*\*under $20 rolls over** to the next month, and FlexInference charges it once it crosses $20.

## Past-due and the 402

If a charge fails, or you owe a fee with **no valid card on file**, the organization goes **past-due**. While past-due:

* Billable flex requests are paused and return a [`402 payment_required`](/errors#payment_required).
* **Free routing (`default`, `priority`, `auto`) and all Anthropic models keep working.** Only priced flex is affected.

A paused flex request returns `402 payment_required` with a body that tells you what failed and how to clear it. See [the 402 error](/errors#payment_required) for the exact JSON.

To clear it, add or update a card on the dashboard **Billing** page, then retry.

<CardGroup cols={2}>
  <Card title="Pricing details" icon="tag" href="https://www.flexinference.com/pricing">
    Full pricing on the FlexInference marketing site.
  </Card>

  <Card title="The 402 error" icon="triangle-exclamation" href="/errors#payment_required">
    How `payment_required` looks on the wire and how to handle it.
  </Card>
</CardGroup>
