API limits

Every cloud AI provider enforces limits on how much data you can send, how much output the model can produce, and how many requests you can make per minute. This page summarises the limits that matter for coaching assessments so you know what to expect before you hit them.

Context windows and output limits

The context window is how much text the model can read in a single request. The output limit is how many tokens the model can write back. A typical coaching transcript is 8,000 to 20,000 tokens depending on session length. The assessment response (with all ICF competency ratings and evidence) uses 15,000 to 35,000 tokens depending on the credential level and whether adaptive thinking is enabled.

The platform sets a default max output for each provider based on their documented API limits. The "API max output" column shows the hard ceiling enforced by the provider's API for the recommended model. The "Platform default" column is what the platform sends if you have not changed the value in Advanced tuning.

Provider	Recommended model	Context window	API max output	Platform default
OpenAI	o3 / o3-mini	200K	100,000	65,536
Anthropic	claude-sonnet-4-6	1M	64,000	65,536
xAI (Grok)	grok-4-1-fast-reasoning	2M	Not published	65,536
Perplexity	sonar-reasoning-pro	128K	Not published	65,536
Mistral	magistral-medium / mistral-large	128-256K	Not published	65,536
Groq	qwen/qwen3-32b	131K	40,960	32,768
Groq	llama-3.3-70b-versatile	131K	32,768	32,768
Google Gemini	gemini-2.5-flash / pro	1M	65,536	65,536

You can adjust max output tokens per provider in Settings > AI > Advanced tuning. If you set a value higher than the model supports, the platform will automatically retry with a safe default. You do not need to know the exact API limit for your model.

Anthropic and adaptive thinking

When Claude uses adaptive thinking (the default for deep analysis), the model reasons internally before writing the assessment. Those reasoning tokens count against the max output limit. The platform sets Anthropic's default to 65,536 tokens to leave room for both reasoning and content. If you reduce this, assessments may be truncated.

Rate limits

Rate limits control how many requests and tokens a provider allows per minute. They are checked when a request starts, not during streaming. Once a response begins streaming, it runs to completion regardless of rate limits.

Most platform users are on entry-level API tiers. These are the limits you will encounter when you first sign up and add a small amount of credit.

Provider	Tier	Requests/min	Input tokens/min	Output tokens/min
Anthropic	Tier 1 ($5 deposit)	50	30,000	8,000
Anthropic	Tier 2 ($40 deposit)	1,000	450,000	90,000
OpenAI	Tier 1 ($5 paid)	Varies by model	Varies by model	Varies by model
xAI (Grok)	Standard	No published RPM cap	4,000,000	4,000,000
Google Gemini	Free	10	250,000	250,000
Groq	Free	30	6,000 to 12,000	6,000 to 12,000
Mistral	Tier 1 ($20 spend)	Varies	Varies	Varies
Perplexity	Pro subscription	Not published	Not published	Not published

Anthropic Tier 1 is tight for coaching assessments

A full coaching assessment with Claude Sonnet uses roughly 18,000 input tokens and 30,000 output tokens (including reasoning). At Tier 1, the output limit is 8,000 tokens per minute. A single assessment that takes 4 minutes to stream will produce about 7,500 output tokens per minute on average, which is close to the limit. Running two assessments back-to-back may trigger a rate limit error on the second request.

If you plan to use Anthropic regularly, deposit $40 to reach Tier 2. The jump from 8K to 90K output tokens per minute removes the constraint entirely.

Groq free tier is too low

Groq's free tier allows 6,000 to 12,000 tokens per minute depending on the model. A coaching transcript alone exceeds this. You need the Groq Developer tier (paid) to run assessments through the platform.

How the platform handles limits

The platform does not require you to manage rate limits manually. Here is what happens behind the scenes when a limit is hit.

Scenario	What happens
Anthropic rate limit (429)	The Anthropic SDK automatically retries up to 2 times with exponential backoff. Most transient limits resolve within seconds.
Other provider rate limit (429)	The request fails with an error message. Click Generate AI Assessment again to retry.
max_tokens exceeds model limit (400)	The platform detects the rejection, reduces max_tokens to the provider's safe default, and retries automatically. No action needed.
Unsupported parameter (400)	If a model does not support a parameter like reasoning_effort, the platform removes it and retries automatically.
Response truncated (max_tokens reached)	The platform detects the truncation and warns you. Increase max output tokens in Advanced tuning for that provider.
Stream stalls (no data for 120 seconds)	The platform cancels the request and reports a timeout. Try again, or increase the timeout in Advanced tuning.

Checking your limits

Each provider has a console page where you can see your current tier and exact rate limits.

Provider	Where to check
Anthropic	console.anthropic.com/settings/limits
OpenAI	platform.openai.com/settings/organization/limits
xAI (Grok)	console.x.ai/team/default/models
Google Gemini	aistudio.google.com > Get API key
Groq	console.groq.com/settings/limits
Mistral	console.mistral.ai/limits
Perplexity	perplexity.ai > Settings > API

Official documentation

For full details on each provider's rate limit tiers, pricing, and upgrade paths:

Context windows and output limits​

Rate limits​

How the platform handles limits​

Checking your limits​

Official documentation​

Context windows and output limits

Rate limits

How the platform handles limits

Checking your limits

Official documentation