API limits
Every cloud AI provider enforces limits on how much data you can send, how much output the model can produce, and how many requests you can make per minute. This page summarises the limits that matter for coaching assessments so you know what to expect before you hit them.
Context windows and output limits
The context window is how much text the model can read in a single request. The output limit is how many tokens the model can write back. A typical coaching transcript is 8,000 to 20,000 tokens depending on session length. The assessment response (with all ICF competency ratings and evidence) uses 15,000 to 35,000 tokens depending on the credential level and whether adaptive thinking is enabled.
The platform sets a default max output for each provider based on their documented API limits. The "API max output" column shows the hard ceiling enforced by the provider's API for the recommended model. The "Platform default" column is what the platform sends if you have not changed the value in Advanced tuning.
| Provider | Recommended model | Context window | API max output | Platform default |
|---|---|---|---|---|
| OpenAI | o3 / o3-mini | 200K | 100,000 | 65,536 |
| Anthropic | claude-sonnet-4-6 | 1M | 64,000 | 65,536 |
| xAI (Grok) | grok-4-1-fast-reasoning | 2M | Not published | 65,536 |
| Perplexity | sonar-reasoning-pro | 128K | Not published | 65,536 |
| Mistral | magistral-medium / mistral-large | 128-256K | Not published | 65,536 |
| Groq | qwen/qwen3-32b | 131K | 40,960 | 32,768 |
| Groq | llama-3.3-70b-versatile | 131K | 32,768 | 32,768 |
| Google Gemini | gemini-2.5-flash / pro | 1M | 65,536 | 65,536 |
You can adjust max output tokens per provider in Settings > AI > Advanced tuning. If you set a value higher than the model supports, the platform will automatically retry with a safe default. You do not need to know the exact API limit for your model.
When Claude uses adaptive thinking (the default for deep analysis), the model reasons internally before writing the assessment. Those reasoning tokens count against the max output limit. The platform sets Anthropic's default to 65,536 tokens to leave room for both reasoning and content. If you reduce this, assessments may be truncated.
Rate limits
Rate limits control how many requests and tokens a provider allows per minute. They are checked when a request starts, not during streaming. Once a response begins streaming, it runs to completion regardless of rate limits.
Most platform users are on entry-level API tiers. These are the limits you will encounter when you first sign up and add a small amount of credit.
| Provider | Tier | Requests/min | Input tokens/min | Output tokens/min |
|---|---|---|---|---|
| Anthropic | Tier 1 ($5 deposit) | 50 | 30,000 | 8,000 |
| Anthropic | Tier 2 ($40 deposit) | 1,000 | 450,000 | 90,000 |
| OpenAI | Tier 1 ($5 paid) | Varies by model | Varies by model | Varies by model |
| xAI (Grok) | Standard | No published RPM cap | 4,000,000 | 4,000,000 |
| Google Gemini | Free | 10 | 250,000 | 250,000 |
| Groq | Free | 30 | 6,000 to 12,000 | 6,000 to 12,000 |
| Mistral | Tier 1 ($20 spend) | Varies | Varies | Varies |
| Perplexity | Pro subscription | Not published | Not published | Not published |
A full coaching assessment with Claude Sonnet uses roughly 18,000 input tokens and 30,000 output tokens (including reasoning). At Tier 1, the output limit is 8,000 tokens per minute. A single assessment that takes 4 minutes to stream will produce about 7,500 output tokens per minute on average, which is close to the limit. Running two assessments back-to-back may trigger a rate limit error on the second request.
If you plan to use Anthropic regularly, deposit $40 to reach Tier 2. The jump from 8K to 90K output tokens per minute removes the constraint entirely.
Groq's free tier allows 6,000 to 12,000 tokens per minute depending on the model. A coaching transcript alone exceeds this. You need the Groq Developer tier (paid) to run assessments through the platform.
How the platform handles limits
The platform does not require you to manage rate limits manually. Here is what happens behind the scenes when a limit is hit.
| Scenario | What happens |
|---|---|
| Anthropic rate limit (429) | The Anthropic SDK automatically retries up to 2 times with exponential backoff. Most transient limits resolve within seconds. |
| Other provider rate limit (429) | The request fails with an error message. Click Generate AI Assessment again to retry. |
| max_tokens exceeds model limit (400) | The platform detects the rejection, reduces max_tokens to the provider's safe default, and retries automatically. No action needed. |
| Unsupported parameter (400) | If a model does not support a parameter like reasoning_effort, the platform removes it and retries automatically. |
| Response truncated (max_tokens reached) | The platform detects the truncation and warns you. Increase max output tokens in Advanced tuning for that provider. |
| Stream stalls (no data for 120 seconds) | The platform cancels the request and reports a timeout. Try again, or increase the timeout in Advanced tuning. |
Checking your limits
Each provider has a console page where you can see your current tier and exact rate limits.
| Provider | Where to check |
|---|---|
| Anthropic | console.anthropic.com/settings/limits |
| OpenAI | platform.openai.com/settings/organization/limits |
| xAI (Grok) | console.x.ai/team/default/models |
| Google Gemini | aistudio.google.com > Get API key |
| Groq | console.groq.com/settings/limits |
| Mistral | console.mistral.ai/limits |
| Perplexity | perplexity.ai > Settings > API |
Official documentation
For full details on each provider's rate limit tiers, pricing, and upgrade paths: