Skip to main content

API limits

Every cloud AI provider enforces limits on how much data you can send, how much output the model can produce, and how many requests you can make per minute. This page summarises the limits that matter for coaching assessments so you know what to expect before you hit them.

Context windows and output limits

The context window is how much text the model can read in a single request. The output limit is how many tokens the model can write back. A typical coaching transcript is 8,000 to 20,000 tokens depending on session length. The assessment response (with all ICF competency ratings and evidence) uses 15,000 to 35,000 tokens depending on the credential level and whether adaptive thinking is enabled.

The platform sets a default max output for each provider based on their documented API limits. The "API max output" column shows the hard ceiling enforced by the provider's API for the recommended model. The "Platform default" column is what the platform sends if you have not changed the value in Advanced tuning.

ProviderRecommended modelContext windowAPI max outputPlatform default
OpenAIo3 / o3-mini200K100,00065,536
Anthropicclaude-sonnet-4-61M64,00065,536
xAI (Grok)grok-4-1-fast-reasoning2MNot published65,536
Perplexitysonar-reasoning-pro128KNot published65,536
Mistralmagistral-medium / mistral-large128-256KNot published65,536
Groqqwen/qwen3-32b131K40,96032,768
Groqllama-3.3-70b-versatile131K32,76832,768
Google Geminigemini-2.5-flash / pro1M65,53665,536

You can adjust max output tokens per provider in Settings > AI > Advanced tuning. If you set a value higher than the model supports, the platform will automatically retry with a safe default. You do not need to know the exact API limit for your model.

Anthropic and adaptive thinking

When Claude uses adaptive thinking (the default for deep analysis), the model reasons internally before writing the assessment. Those reasoning tokens count against the max output limit. The platform sets Anthropic's default to 65,536 tokens to leave room for both reasoning and content. If you reduce this, assessments may be truncated.

Rate limits

Rate limits control how many requests and tokens a provider allows per minute. They are checked when a request starts, not during streaming. Once a response begins streaming, it runs to completion regardless of rate limits.

Most platform users are on entry-level API tiers. These are the limits you will encounter when you first sign up and add a small amount of credit.

ProviderTierRequests/minInput tokens/minOutput tokens/min
AnthropicTier 1 ($5 deposit)5030,0008,000
AnthropicTier 2 ($40 deposit)1,000450,00090,000
OpenAITier 1 ($5 paid)Varies by modelVaries by modelVaries by model
xAI (Grok)StandardNo published RPM cap4,000,0004,000,000
Google GeminiFree10250,000250,000
GroqFree306,000 to 12,0006,000 to 12,000
MistralTier 1 ($20 spend)VariesVariesVaries
PerplexityPro subscriptionNot publishedNot publishedNot published
Anthropic Tier 1 is tight for coaching assessments

A full coaching assessment with Claude Sonnet uses roughly 18,000 input tokens and 30,000 output tokens (including reasoning). At Tier 1, the output limit is 8,000 tokens per minute. A single assessment that takes 4 minutes to stream will produce about 7,500 output tokens per minute on average, which is close to the limit. Running two assessments back-to-back may trigger a rate limit error on the second request.

If you plan to use Anthropic regularly, deposit $40 to reach Tier 2. The jump from 8K to 90K output tokens per minute removes the constraint entirely.

Groq free tier is too low

Groq's free tier allows 6,000 to 12,000 tokens per minute depending on the model. A coaching transcript alone exceeds this. You need the Groq Developer tier (paid) to run assessments through the platform.

How the platform handles limits

The platform does not require you to manage rate limits manually. Here is what happens behind the scenes when a limit is hit.

ScenarioWhat happens
Anthropic rate limit (429)The Anthropic SDK automatically retries up to 2 times with exponential backoff. Most transient limits resolve within seconds.
Other provider rate limit (429)The request fails with an error message. Click Generate AI Assessment again to retry.
max_tokens exceeds model limit (400)The platform detects the rejection, reduces max_tokens to the provider's safe default, and retries automatically. No action needed.
Unsupported parameter (400)If a model does not support a parameter like reasoning_effort, the platform removes it and retries automatically.
Response truncated (max_tokens reached)The platform detects the truncation and warns you. Increase max output tokens in Advanced tuning for that provider.
Stream stalls (no data for 120 seconds)The platform cancels the request and reports a timeout. Try again, or increase the timeout in Advanced tuning.

Checking your limits

Each provider has a console page where you can see your current tier and exact rate limits.

ProviderWhere to check
Anthropicconsole.anthropic.com/settings/limits
OpenAIplatform.openai.com/settings/organization/limits
xAI (Grok)console.x.ai/team/default/models
Google Geminiaistudio.google.com > Get API key
Groqconsole.groq.com/settings/limits
Mistralconsole.mistral.ai/limits
Perplexityperplexity.ai > Settings > API

Official documentation

For full details on each provider's rate limit tiers, pricing, and upgrade paths: