AI services guide: subscriptions, free tiers, and APIs (March 2026)
A detailed breakdown of every major AI provider in March 2026 – free tiers, chat subscriptions, API pricing, coding agents, model capabilities, mobile SDKs, and open source status.
Two ways to use AI models
Every provider offers two access layers. They are almost always billed separately:
- Chat product – the app you talk to (ChatGPT, Claude, Gemini, Grok, Le Chat, Qwen, Copilot, Perplexity, Nova). Every provider has a free tier with limited usage, and paid subscriptions ($8-300/mo) that unlock premium models, higher limits, and features like image gen, voice, and video.
- API – pay-per-token programmatic access for building apps. Separate billing, separate account, separate pricing.
The only overlaps: Google AI Pro/Ultra subscribers get Google Cloud credits ($10/$100 per month) toward Gemini API usage. Perplexity Pro subscribers get $5/month in API credits. Everyone else: your subscription gets you the chat app, not programmatic access.
Quick comparison
| Provider | Chat product | Cheapest API (in/out per 1M) | Free API tier | Max context | Open source |
|---|---|---|---|---|---|
| Google Gemini | Gemini | $0.10/$0.40 (Flash-Lite 2.5) | Unlimited, no card | 1M | Gemma (open weights) |
| OpenAI | ChatGPT | $0.10/$0.40 (GPT-4.1 Nano) | Limited trial credits | 1M+ | GPT-OSS (Apache 2.0) |
| Anthropic | Claude | $1/$5 (Haiku 4.5) | $5 trial credits | 200K (1M beta) | No |
| xAI | Grok | $0.20/$0.50 (Grok 4.1 Fast) | $25 sign-up credits | 2M | Partial |
| DeepSeek | DeepSeek | $0.028/$0.42 (V3.2 cached) | 5M tokens free | 128K | Yes (MIT) |
| Mistral | Le Chat | $0.02/$0.04 (Nemo) | API trial credits | 256K | Partial |
| Meta Llama | – | Free (self-host) | Models are free | 10M (Scout) | Yes |
| Alibaba Qwen | Qwen | $0.05/$0.20 (Qwen3-8B) | 1M tokens/model | 10M (Qwen-Long) | Yes (Apache 2.0) |
| Amazon Nova | Nova | $0.035/$0.14 (Nova Micro) | $200 AWS credits | 1M (Premier) | No |
| Microsoft Phi | Copilot | $0.075/$0.30 (Phi-4-mini) | GitHub Models free | 128K | Yes (MIT) |
| Perplexity | Perplexity | $1/$1 (Sonar) + per-req fee | No free tier | 200K | No |
| Groq | – | $0.05/$0.08 (Llama 3.1 8B) | Rate-limited, no card | 256K | Hosts OSS only |
“Open weights” vs “open source”: The “Open source” column above uses industry shorthand. Strictly speaking, most providers release open weights (trained model parameters you can download and run) but not full training code or data. Here’s the distinction:
| Term | Model weights | Training code | Training data |
|---|---|---|---|
| Open weights | ✓ | ✗ | ✗ |
| Open source (OSI definition) | ✓ | ✓ | ✓ |
In practice: Apache 2.0 / MIT models (Qwen, DeepSeek, Phi, GPT-OSS) are the closest to true open source. Gemma and Llama release weights under more restrictive licenses. Anthropic, Amazon, and Perplexity release no weights at all.
Subscription comparison
All 12 providers at a glance
| Provider | Chat product | Free tier | Cheapest paid | ~$20 tier | High-end |
|---|---|---|---|---|---|
| Google Gemini | Gemini | 50 daily credits | AI Plus $8/mo | AI Pro $20/mo | AI Ultra $250/mo |
| OpenAI | ChatGPT | ~10 msgs / 5 hrs | Go $8/mo | Plus $20/mo | Pro $200/mo |
| Anthropic | Claude | ~15-40 msgs / 5 hrs | – | Pro $20/mo | Max 20x $200/mo |
| xAI | Grok | ~10 msgs / 2 hrs | X Premium $8/mo | SuperGrok $30/mo | Heavy $300/mo |
| DeepSeek | DeepSeek | Unlimited (no sub) | – | – | – |
| Mistral | Le Chat | ~25 msgs | Student $6/mo | Pro $15/mo | Enterprise (custom) |
| Meta Llama | – | Models are free | – | – | – |
| Alibaba Qwen | Qwen | Unlimited (no sub) | – | – | – |
| Amazon Nova | Nova | Free (US only) | – | – | – |
| Microsoft | Copilot | Limited GPT-5.x | – | Copilot Pro $20/mo | M365 Enterprise $30/seat/mo |
| Perplexity | Perplexity | Limited Pro Searches | – | Pro $20/mo | Max $200/mo |
| Groq | – | Rate-limited API | Pay-as-you-go | – | – |
Feature comparison (~$20 tier)
Seven providers offer a tier around $15-30/month. Here’s what you get:
| Feature | ChatGPT Plus ($20) | Claude Pro ($20) | Gemini AI Pro ($20) | SuperGrok ($30) | Le Chat Pro ($15) | Copilot Pro ($20) | Perplexity Pro ($20) |
|---|---|---|---|---|---|---|---|
| Top model | GPT-5.2 Thinking | Opus 4.6 | Gemini 3.1 Pro | Grok 4.1 | Mistral Large 3 | GPT-5.x | GPT-5.4, Claude, Gemini |
| Context | 32K | 200K (1M beta) | 128K+ | 128K | 128K | – | 200K |
| Image gen | DALL-E (~180/day) | No | Nano Banana Pro | Aurora/Imagine | Yes | DALL-E (M365 only) | Yes |
| Video gen | Sora (720p, 20s) | No | Veo (limited) | Imagine (720p, 6s) | No | No | Yes |
| Voice mode | Advanced (video/screen) | No | Yes | Extended | No | No | No |
| Web search | Yes | Yes | Deep Search | DeepSearch | AFP-verified | Bing | Grounded (citations) |
| Coding agent | Codex | Claude Code | Jules (5x free) | Grok Build (waitlist) | Vibe 2.0 | – | – |
| Deep reasoning | GPT-5.2 Thinking | Extended Thinking | Deep Research | Big Brain Mode | Flash Answers | – | Deep Research |
| No training | No | Yes | No | No | Yes | No | No |
| API credits | No | No | $10/mo GCP | No | No | No | $5/mo |
| Unique value | Broadest features | Best coding | Workspace + API credits | 2M context, X data | Cheapest, no telemetry | M365 integration | Search + multi-model |
Best free tier: DeepSeek and Qwen (both unlimited, no sub) or Gemini (50 daily credits).
Best value at ~$20: Claude Pro for coding depth. ChatGPT Plus for broadest features. Gemini AI Pro for Workspace integration and the only one with API credits. Perplexity Pro for search + access to third-party models. Le Chat Pro is cheapest at $15 with no-telemetry.
High-end tiers: Claude Max 20x ($200) for near-unlimited Opus 4.6 + Claude Code. ChatGPT Pro ($200) for unlimited everything + Sora 4K. Google AI Ultra ($250) for 25K credits + YouTube Premium + Deep Think. SuperGrok Heavy ($300) for Grok 4 Heavy multi-agent.
Coding agents comparison
Every major provider now ships a coding agent. Plus a growing ecosystem of third-party agents that use their APIs.
First-party agents
| Agent | Provider | Type | Cheapest access | Rules | Memory | MCP | Sub-agents |
|---|---|---|---|---|---|---|---|
| Codex | OpenAI | Cloud sandbox | Plus $20/mo | AGENTS.md |
No | Yes | Yes |
| Claude Code | Anthropic | Terminal CLI | Pro $20/mo | CLAUDE.md |
Yes | Yes | Yes |
| Jules | Async GitHub | Free (15 tasks/day) | AGENTS.md |
Partial | Yes | No | |
| Gemini CLI | Terminal CLI | Free (1K req/day) | GEMINI.md |
Yes | Yes | Yes | |
| Gemini Code Assist | IDE assistant | Free | – | – | – | – | |
| Antigravity | Agentic IDE | Free (preview) | .agents/workflows/ |
Yes | Yes | Yes | |
| Grok Build | xAI | Local CLI | Waitlist | .grok/GROK.md |
Partial | Yes | Yes (8x) |
| Vibe 2.0 | Mistral | Terminal CLI | Le Chat Pro $15/mo | AGENTS.md |
No | Yes | Yes |
| Qwen Code | Alibaba | Terminal CLI | Free (1K req/day) | QWEN.md |
Yes | Yes | Yes |
| Amazon Q Developer | Amazon | IDE + CLI | Free (50 agentic/mo) | .amazonq/rules/ |
Partial | Yes | No |
Other first-party agents (not coding-specific): Agent Mode (OpenAI, web automation, Plus $20/mo), Cowork (Anthropic, desktop agent, Pro $20/mo), Computer Use (Anthropic, screen automation, API only).
Third-party agents
These use the APIs above. You pick the model, they provide the IDE/workflow.
| Agent | Type | Pricing | Rules | Memory | MCP | Sub-agents |
|---|---|---|---|---|---|---|
| Cursor | VS Code fork | Free / Pro $20 / Ultra $200 | .cursor/rules/*.mdc |
Yes | Yes | Yes |
| Windsurf | Agentic IDE | Free / Pro $15 / Teams $30 | .windsurf/rules/*.md |
Yes | Yes | No |
| GitHub Copilot | IDE integration | Free / Pro $10 / Pro+ $39 | .github/copilot-instructions.md |
Yes | Yes | Yes |
| Cline | VS Code extension | Free (OSS, pay API) / Teams $20 | .clinerules |
Yes | Yes | No |
| Aider | Terminal | Free (OSS, pay API) | CONVENTIONS.md |
No | Partial | No |
| Continue | VS Code / JetBrains | Solo free / Team $10/dev | .continue/rules/*.md |
No | Yes | No |
Rules: Every agent now supports project-level instruction files, but there is no standard. AGENTS.md is gaining traction across OpenAI, Google, and Mistral tools. GitHub Copilot also reads AGENTS.md and CLAUDE.md.
MCP: Near-universal. 18 of 19 agents support MCP natively. Aider has only community/experimental support.
Memory: Split. About half have true persistent cross-session memory. Others rely on rules files as static context.
Note: GitHub Copilot is a Microsoft product. The Pro+ tier ($39/mo) includes an autonomous coding agent that creates PRs from GitHub issues and model choice across providers.
Google Gemini
gemini.google.com | AI Studio | API docs
The most generous free tier of any provider. No credit card, no trial period, just get an API key and go.
Models
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Gemini 3.1 Pro | 1M | $2.00 / $12.00 | Flagship reasoning, agentic workflows, 77.1% ARC-AGI-2 |
| Gemini 3 Flash | 1M | $0.50 / $3.00 | Near-Pro reasoning at fraction of cost |
| Gemini 3.1 Flash-Lite | 1M | $0.25 / $1.50 | Most cost-efficient, high-volume workloads |
| Gemini 2.5 Pro | 1M | $1.25 / $10.00 | Deep reasoning, coding, math, science |
| Gemini 2.5 Flash | 1M | $0.30 / $2.50 | Best price-performance for reasoning |
| Gemini 2.5 Flash-Lite | 1M | $0.10 / $0.40 | Fastest and cheapest multimodal model |
All models accept text, image, audio, video, and PDF input. 1M token context across the board.
Specialized models: Imagen 4 (image gen, $0.02-0.06/image), Veo 3.1 (video gen), Gemini Embedding 2 (multimodal embeddings), Computer Use (UI automation).
Subscriptions
| Tier | Price | Storage | Models | Key features | API access |
|---|---|---|---|---|---|
| Free | $0 | 15 GB | Gemini 3 | 50 daily AI credits, basic NotebookLM (100 notebooks, 50 queries/day), limited image gen | No |
| AI Plus | $7.99/mo | 200 GB | Gemini 3 Pro | 200 monthly credits, limited Veo 3.1 Fast video, limited Workspace AI, share with 5 family members | No |
| AI Pro | $19.99/mo | 2 TB | Gemini 3.1 Pro | 1,000 monthly credits, Deep Research, Deep Search, NotebookLM (500 notebooks, 300 sources, 500 queries/day), full Workspace AI (Gmail/Docs/Sheets/Slides), Jules (5x free), Gemini Code Assist, Gems (custom AI personas), family sharing | $10/mo GCP credits |
| AI Ultra | $249.99/mo | 30 TB | Gemini 3.1 Pro, Deep Think | 25,000 monthly credits, Deep Think (extended reasoning, exclusive), highest Deep Research access, Veo 3.1 full + early Veo 3.16, Jules (20x free, multi-agent), YouTube Premium included, early access to experimental features, family sharing | $100/mo GCP credits |
Credits do NOT roll over. See subscription plans and Google One AI plans for details.
Free tier (API)
| Model | Requests/min | Tokens/min | Requests/day |
|---|---|---|---|
| Gemini 3 Flash | 10 | 250K | 250 |
| Gemini 3.1 Flash-Lite | 15 | 250K | 1,000 |
| Gemini 2.5 Pro | 5 | 250K | 100 |
| Gemini 2.5 Flash | 10 | 250K | 250 |
| Gemini 2.5 Flash-Lite | 15 | 250K | 1,000 |
Free tier covers Flash and Flash-Lite models across generations (not Pro). No credit card required. Content on free tier may be used to improve Google’s products.
Open source
Gemini itself is proprietary. Gemma (1B-27B params) is Google’s open-weight alternative, built on the same research. Free for commercial use.
Mobile SDK
Google consolidated mobile access under Firebase AI Logic. SDKs for Swift (iOS), Kotlin (Android), Flutter, React Native, Unity, and Web. The older standalone Google AI SDKs are deprecated.
Links
OpenAI
chatgpt.com | platform.openai.com | API docs
The largest model lineup. GPT-5.x for general use, o-series for reasoning, GPT-4.1 for low-latency tool calling.
Models
GPT-5 series:
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| GPT-5.4 | 1M+ | $2.50 / $15.00 | Latest flagship, computer-use, 33% fewer errors vs 5.2 |
| GPT-5.4 Pro | 1M+ | $30.00 / $180.00 | Extended reasoning |
| GPT-5.2 | 400K | $1.75 / $14.00 | Workhorse model, cached input at $0.175 |
| GPT-5 | 400K | $1.25 / $10.00 | Original GPT-5 |
| GPT-5 Mini | 400K | $0.25 / $2.00 | Faster, cheaper |
| GPT-5 Nano | 400K | $0.05 / $0.40 | Cheapest GPT-5, edge-friendly |
O-series (reasoning):
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| o3 | 200K | $2.00 / $8.00 | Advanced reasoning (math, science, coding) |
| o3 Pro | 200K | $20.00 / $80.00 | Extended compute |
| o4 Mini | 200K | $1.10 / $4.40 | Fast reasoning, efficient |
| o4 Mini High | 200K | $1.10 / $4.40 | Same per-token price, higher reasoning compute budget |
O-series models use hidden “reasoning tokens” billed as output. A 500-token visible response may cost 2,000+ tokens total.
GPT-4.1 series (still available):
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| GPT-4.1 | 1M | $2.00 / $8.00 | Instruction following, tool calling |
| GPT-4.1 Mini | 1M | $0.40 / $1.60 | Same strengths, lower cost |
| GPT-4.1 Nano | 1M | $0.10 / $0.40 | Cheapest in the lineup |
All models accept text and image input. Batch API available at 50% off across all models.
Subscriptions
Individual plans:
| Tier | Price | Models | Message limits | Key features |
|---|---|---|---|---|
| Free | $0 | GPT-5.2 Instant (lightweight) | ~10 per 5 hours, falls back to Mini | Basic DALL-E (2-3 images/day), limited Sora 2, limited Deep Research, basic voice, can use GPTs but not create them |
| ChatGPT Go | $8/mo | GPT-5.2 Instant (unlimited) | ~10x Free | DALL-E, file upload, code interpreter, web browsing, create custom GPTs. No Sora, no Deep Research, no Agent Mode, no Codex |
| ChatGPT Plus | $20/mo | GPT-5.2, GPT-5.2 Thinking, o3-pro | ~3,000/week (Thinking) | DALL-E (~180 images/day), Sora (1,000 credits/mo, 720p, 20s clips), Deep Research, Agent Mode, Codex, Canvas, Tasks, Advanced Voice (video/screen sharing), 32K context |
| ChatGPT Pro | $200/mo | All Plus models + GPT-5.2 Pro | Effectively unlimited | Unlimited DALL-E, Sora (10,000 credits + unlimited relaxed mode, 1080p/4K, 90s clips, no watermark), 128K context (4x Plus), ChatGPT Pulse (exclusive), early access to new features |
Business plans:
| Tier | Price | Notes |
|---|---|---|
| Team | $25-30/user/mo | Same as Plus models + higher Thinking caps, admin console, conversations NOT used for training, shared workspace GPTs |
| Enterprise | Custom | Unlimited higher-speed GPT-5.2, extended context, SSO, SCIM, SOC 2 + ISO compliance, custom data retention, not trained on |
None of these include API credits. See ChatGPT plans for details. API is a completely separate billing system at platform.openai.com.
Free tier (API)
The free trial credit program ($5) was discontinued in mid-2025. Reports conflict on whether new accounts still receive credits. Check platform.openai.com directly. Pay-as-you-go with no minimum spend once you add a payment method.
Open source
OpenAI released two open-weight models under Apache 2.0:
- GPT-OSS-120B: ~117B params (5.1B active via MoE), runs on a single 80GB GPU. Matches o4-mini on reasoning.
- GPT-OSS-20B: Runs on 16GB memory. Matches o3-mini.
Both on Hugging Face with MXFP4 quantization.
Mobile SDK
No official native iOS/Android SDK. The API is standard REST/HTTP. Community options:
- MacPaw/OpenAI – third-party Swift package for iOS/macOS
OpenAI also offers an Apps SDK (preview) for building apps that run inside ChatGPT, and a Realtime API via WebRTC for client-side voice/text streaming.
Links
Anthropic Claude
claude.ai | console.anthropic.com | API docs
The smallest model lineup but strong across the board. Known for instruction following, safety, and coding quality.
Models
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Claude Opus 4.6 | 200K (1M beta) | $5 / $25 | Most intelligent; agents, coding, complex reasoning |
| Claude Sonnet 4.6 | 200K (1M beta) | $3 / $15 | Best balance of speed and intelligence |
| Claude Haiku 4.5 | 200K | $1 / $5 | Fastest, near-frontier intelligence |
All models support text + image input and PDF processing (up to 100 images per request). Extended thinking available on all three. 1M context in beta for Opus and Sonnet (via context-1m-2025-08-07 header).
Prompt caching: Cache hits cost 0.1x input price (90% savings). Cache write costs 1.25x-2x depending on TTL (5 min or 1 hour).
Batch API: 50% discount on all models.
Subscriptions
Individual plans:
| Tier | Price | Models | Usage | Key features |
|---|---|---|---|---|
| Free | $0 | Sonnet 4.6 | ~15-40 msgs per 5-hour window | 200K context, up to 20 file uploads (30 MB each), web search, Projects, Artifacts. No Extended Thinking, no Computer Use, no Claude Code, no Opus. Conversations may be used for training |
| Pro | $20/mo ($17 annual) | Sonnet 4.6, Opus 4.5, Opus 4.6, Haiku 4.5 | 5x Free (~75-200 per 5 hours) | Extended Thinking, Claude Code, Cowork, unlimited Projects, limited Computer Use, priority access. NOT used for training |
| Max 5x | $100/mo | All models including Opus 4.6 | 5x Pro / 25x Free | Higher Extended Thinking compute, Computer Use, 200K context (1M beta, 128K max output) |
| Max 20x | $200/mo | All models including Opus 4.6 | 20x Pro / 100x Free | Everything in Max 5x with 4x the usage ceiling. Best for heavy developers needing near-unlimited Opus 4.6 |
Business plans:
| Tier | Price | Notes |
|---|---|---|
| Team Standard | $25/seat/mo | Same as Pro, shared project folders, admin console, SSO, domain capture. Min 5 seats. NOT trained on |
| Team Premium | $150/seat/mo | Everything in Standard + Claude Code included |
| Enterprise | Custom | Fine-grained RBAC, SCIM provisioning, SAML 2.0 SSO, compliance API, audit logs, custom data retention, usage-based billing with admin spending caps, Claude Code included |
See Claude plans for details. API is separate at console.anthropic.com.
Free tier (API)
$5 in free credits upon creating a developer account (phone verification required, region-dependent). After that, pure pay-as-you-go with tier-based rate limits.
Open source
Not open source. No model weights released. Anthropic has been explicit about this as a safety position. The Model Context Protocol (MCP) is open source.
Mobile SDK
No official Swift or Kotlin SDK. Seven official server-side SDKs: Python, TypeScript, Java, Go, Ruby, C#, PHP. Community mobile options:
- SwiftAnthropic – third-party Swift package
- AnthropicKit – third-party Swift SDK
Links
xAI (Grok)
grok.com | x.ai/api | API docs
Largest context window (2M tokens) and aggressive sign-up credits. Built into X (Twitter) with real-time social data access.
Models
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Grok 4.20 Multi-Agent | 2M | $2.00 / $6.00 | Multi-agent orchestration |
| Grok 4.20 (reasoning) | 2M | $2.00 / $6.00 | Flagship reasoning |
| Grok 4.1 Fast (reasoning) | 2M | $0.20 / $0.50 | Best value |
| Grok Code Fast 1 | 256K | $0.20 / $1.50 | Code-specialized |
Image input supported on Grok 4.x models. Image generation ($0.02-0.07/image) and video generation ($0.05/second) available as separate endpoints.
Subscriptions
| Tier | Price | Models | Limits | Key features |
|---|---|---|---|---|
| Free | $0 | Grok 3 (limited) | ~10 msgs per 2 hours | ~10 Aurora images per 2 hours, basic voice. No DeepSearch, no Think mode |
| X Premium | $8/mo | Grok 3, limited Grok 4 | Higher than Free | Image gen, voice, basic memory. Bundled with blue checkmark + reduced ads. 50% off SuperGrok |
| X Premium+ | $40/mo | Grok 4, Grok 4.1 | ~100 prompts per 2 hours | Limited DeepSearch, limited Think (~30 per 2 hours), extended voice + memory. Bundled with ad revenue sharing + ad-free browsing. 50% off SuperGrok |
| SuperGrok | $30/mo | Grok 4, Grok 4.1 | ~30 per 2 hours (Think/DeepSearch) | 128K context memory, full DeepSearch, Big Brain Mode (extended reasoning), Imagine (image gen + 720p 6s video), Aurora image editing, priority routing |
| SuperGrok Heavy | $300/mo | Grok 4 Heavy (multi-agent) | Dramatically higher | 256K context (2x SuperGrok), maximum compute priority, 100% AIME 2025, Grok 4 Heavy exclusive |
| Grok Business | $30-300/seat/mo | Standard or Heavy | Team-managed | Admin controls, API access included |
See Grok plans for details. API is separate pay-as-you-go for non-Business tiers. $25 in free sign-up credits.
Open source
Grok 1 and Grok 2.5 weights are on Hugging Face. Grok 3 open-source planned. Current frontier models (4.x) are proprietary.
Mobile SDK
No dedicated SDK. The API is OpenAI-compatible, so any OpenAI client library works. Grok mobile app available on iOS and Android.
Links
DeepSeek
chat.deepseek.com | platform.deepseek.com | API docs
The price disruptor. V3.2 costs roughly 95% less than GPT-4 Turbo with competitive quality.
Models
| Model | Context | Input/Output (per 1M) | Cache hit | Notes |
|---|---|---|---|---|
| deepseek-chat (V3.2) | 128K | $0.28 / $0.42 | $0.028 | Non-thinking, 8K max output |
| deepseek-reasoner (V3.2) | 128K | $0.28 / $0.42 | $0.028 | Thinking mode, 64K max output |
Text-only as of March 2026. V4 with multimodal is in development.
Off-peak discount: 50-75% off during 16:30-00:30 GMT.
Subscriptions
No subscription model. The chat app (web + mobile) is completely free with unlimited messages within daily quotas. No paid tier exists – they just give you the models for free. API is pure pay-as-you-go at platform.deepseek.com. New accounts get 5M free tokens (30-day validity).
Open source
Fully open source under MIT license. Weights on Hugging Face for V3.2 and R1. Self-host on your own hardware. A quantized 7B version runs on Android.
Mobile SDK
No dedicated SDK. Official apps on iOS and Android (free). The API is OpenAI-compatible.
Links
- Chat app (free)
- API pricing
- API docs
Mistral AI
chat.mistral.ai | console.mistral.ai | API docs
European provider with the widest range from ultra-cheap (Nemo at $0.02/M) to frontier.
Models
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Mistral Large 3 | 128K | $0.50 / $1.50 | Flagship |
| Mistral Medium 3 | 128K | $0.40 / $2.00 | Mid-tier |
| Mistral Small 3.1 (24B) | 128K | $0.35 / $0.56 | Efficient |
| Mistral Nemo | 128K | $0.02 / $0.04 | Ultra-budget |
| Codestral | 256K | Varies | Code specialist (22B, 80+ languages) |
| Devstral 2 | 128K | $0.40 / $2.00 | Powers Vibe 2.0 coding agent |
| Pixtral Large | 128K | Varies | Multimodal (124B, text + image, up to 30 images) |
Subscriptions (Le Chat)
| Tier | Price | Limits | Key features |
|---|---|---|---|
| Free | $0 | ~25 msgs before throttling | 128K context, code interpreter, document uploads, web search (AFP-verified), Canvas (web pages/graphs/presentations), image gen. Prompts may be used for training |
| Pro | $14.99/mo | ~6x Free (soft cap, fair use) | 150 Flash Answers/day (ultra-fast, 1,000 words/sec), Vibe 2.0 coding agent, No Telemetry Mode (never trained on), priority responses, early feature access |
| Student | $5.99/mo | Same as Pro | >50% discount, verified students, 12-month validity |
| Team | $24.99/user/mo | Pro per seat | 30 GB/user shared RAG libraries, admin console, data training opt-out default |
| Enterprise | Custom | Custom | On-prem deployment, custom models, multi-step agent pipelines |
See Le Chat pricing for details. API is separate pay-as-you-go at console.mistral.ai.
Open source
Mixed. Apache 2.0: Mistral 7B, Mixtral 8x7B, Mistral Nemo, Mistral Small 3. Non-production license: Codestral. Proprietary: Large, Medium, Pixtral Large.
Mobile SDK
No dedicated SDK. Python and JavaScript/TypeScript client libraries available.
Links
Meta Llama
Not a service – it’s a family of models you can run anywhere. Free weights, massive ecosystem.
Models
| Model | Params (Active/Total) | Context | Multimodal |
|---|---|---|---|
| Llama 4 Scout | 17B / 109B (16 experts) | 10M | Text, image, video |
| Llama 4 Maverick | 17B / 400B (128 experts) | 1M | Text, image, video |
| Llama 4 Behemoth | 288B / 2T (16 experts) | TBD | Yes (still training) |
| Llama 3.3 70B | 70B | 128K | Text only |
| Llama 3.1 8B | 8B | 128K | Text only |
Scout’s 10M token context is the longest of any open model. Scout fits on a single H100 with INT4 quantization.
API pricing (via providers)
| Provider | Llama 4 Scout | Llama 3.3 70B | Speed |
|---|---|---|---|
| Groq | $0.11 / $0.34 | $0.59 / $0.79 | 594 t/s |
| Together AI | – | ~$0.88 / $0.88 | – |
| DeepInfra | – | ~$0.15 / $0.15 | Budget |
| Self-hosted | Free | Free | You pay compute |
Prices vary up to 6.8x across providers for the same model.
Open source
Yes. Llama Community License – free for commercial use (companies under 700M MAU). Weights on Hugging Face and llama.com.
Mobile SDK
No official SDK. On-device inference via llama.cpp, MLX, ONNX. Llama 3.1 8B and Scout can run locally on capable devices.
Links
Alibaba Qwen
qwen.ai | chat.qwen.ai | API platform | API docs
China’s strongest open-source model family. Massive lineup from 0.8B to 480B, all Apache 2.0. The chat app is completely free.
Models
Text generation (API):
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Qwen3-Max (235B MoE) | 262K | $1.20 / $6.00 | Flagship reasoning |
| Qwen3-Max Thinking | 262K | $0.78 / $3.90 | Thinking mode, flat rate |
| Qwen3.5-Plus (MoE) | 1M | $0.26 / $1.56 | Balanced performance |
| Qwen3.5-Flash (MoE) | 1M | $0.10 / $0.40 | Fast and cheap |
| QwQ-Plus (32B reasoning) | 131K | $0.80 / $2.40 | Reasoning specialist |
| Qwen-Long | 10M | Tiered | Ultra-long document analysis |
Coding models:
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Qwen3-Coder-Plus (480B MoE) | 1M | $0.65 / $3.25 | Flagship coding, SWE-bench leading |
| Qwen3-Coder-Next (80B MoE) | 262K | $0.07 / $0.30 | Open-weight (Apache 2.0), hybrid attention |
Vision / multimodal: Qwen3-VL-Plus (262K), QVQ-Max (visual reasoning), Qwen3-Omni-Flash (audio + video + text). Specialized: Qwen3-Deep-Research (1M), Qwen-OCR, Qwen-MT (translation, 92 languages).
Tiered pricing: rates increase as input length grows (0-32K cheapest, 128K+ most expensive). Batch API: 50% off.
Open-weight models
| Model | Params (Active) | Context | License |
|---|---|---|---|
| Qwen3.5-397B-A17B | 397B (17B) | 262K | Apache 2.0 |
| Qwen3.5-122B-A10B | 122B (10B) | 262K | Apache 2.0 |
| Qwen3.5-27B (dense) | 27B | 262K | Apache 2.0 |
| Qwen3.5-9B (dense) | 9B | 262K | Apache 2.0 |
| Qwen3.5-4B (dense) | 4B | 262K | Apache 2.0 |
| Qwen3.5-2B (dense) | 2B | 262K | Apache 2.0 |
| Qwen3.5-0.8B (dense) | 0.8B | 262K | Apache 2.0 |
| QwQ-32B | 32B | 32K | Apache 2.0 |
All on Hugging Face and ModelScope. Apache 2.0 = full commercial use.
Subscriptions
Qwen Chat (web, iOS, Android, desktop) is completely free – full access to Qwen3, image gen, video understanding, deep thinking, web search, artifacts. No paid tier.
Coding Plan (developer subscription, Feb 2026): Flat-rate alternative to per-token billing for coding workflows.
| Tier | Price | Requests/month | Works with |
|---|---|---|---|
| Lite | ~$10/mo | 18,000 | Claude Code, Qwen Code, Cline, Cursor |
| Pro | ~$50/mo | 90,000 | Same + higher rate limits |
Free tier (API)
New accounts: 1M tokens free per model (90-day validity). Qwen Code OAuth: 1,000 free requests/day with no billing setup.
Open source
Yes. All Qwen3 and Qwen3.5 open-weight models are Apache 2.0 – full commercial use, no restrictions. Proprietary models (Qwen-Max, QwQ-Plus) are API-only.
Mobile SDK
No official SDK. On-device via llama.cpp, MLX, Ollama, ONNX. The 0.8B-4B models are designed for on-device use. Qwen3.5-2B runs at 30-50 t/s on iPhone/M-series Macs via MLX.
Links
Amazon Nova
aws.amazon.com/nova | nova.amazon.com | Bedrock pricing
AWS’s in-house model family. Cheapest token pricing in the industry at the low end (Nova Micro $0.035/1M input). Consumer chat launched at nova.amazon.com.
Models
Nova 1 (understanding):
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Nova Micro | 128K | $0.035 / $0.14 | Text only, lowest cost |
| Nova Lite | 300K | $0.06 / $0.24 | Text + image + video |
| Nova Pro | 300K | $0.80 / $3.20 | Best accuracy/cost balance |
| Nova Premier | 1M | $2.50 / $12.50 | Most capable, 200+ languages |
Nova 2 (reasoning, Dec 2025):
| Model | Context | Notes |
|---|---|---|
| Nova 2 Lite | 1M | Built-in code interpreter, web grounding, MCP support |
| Nova 2 Pro | 1M | Advanced multi-step reasoning, agentic workflows (Preview) |
| Nova 2 Omni | – | All modalities (text, image, video, speech in/out) (Preview) |
Creative: Nova Canvas (image gen), Nova Reel (video gen, 1280x720 24fps). Other: Nova Sonic (speech-to-speech), Nova Act ($4.75/agent-hour browser automation).
Batch API: 50% off on-demand pricing.
Subscriptions
Nova Chat – free consumer chat app. Uses your Amazon.com account (not AWS). Powered by Nova Pro. Image creation, code gen. US only at launch.
Alexa+ – consumer voice assistant upgraded with Nova models under the hood.
No traditional subscription tiers like ChatGPT/Claude. Access is either free (Nova Chat) or pay-per-token (Bedrock API).
Free tier (API)
No permanent free API tier. New AWS accounts get $200 in credits ($100 sign-up + $100 onboarding). Credits apply to all AWS services including Bedrock, valid for 6 months. At Nova Micro pricing, that’s ~5.7B input tokens.
Open source
No. All Nova models are proprietary. Weights not released. Amazon publishes detailed technical reports but not weights. Bedrock also hosts third-party open models (Llama, Mistral).
Mobile SDK
AWS Amplify AI Kit for React Native and Android. Direct Bedrock API calls via AWS SDK with SigV4 signing. Also supported via Vercel AI SDK. No first-party Swift SDK (use Objective-C bindings from Swift).
Links
Microsoft Phi
azure.microsoft.com/products/phi | GitHub Models | Azure AI pricing
Best small models in the industry. All MIT-licensed. Phi-4-mini at 3.8B params punches far above its weight.
Models
| Model | Params | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|---|
| Phi-4 | 14B | 16K | $0.125 / $0.50 | Base reasoning/instruction |
| Phi-4-mini | 3.8B | 128K | $0.075 / $0.30 | Grouped-query attention, function calling |
| Phi-4-multimodal | 5.6B | 128K | $0.08 / $0.32 | Text + vision + audio in one model |
| Phi-4-reasoning | 14B | 32K | – | Chain-of-thought fine-tune |
| Phi-4-reasoning-plus | 14B | 32K | – | Higher RL compute |
| Phi-4-reasoning-vision-15B | 15B | 32K | – | Multimodal reasoning (MIT, Mar 2026) |
| Phi-3.5-MoE | 42B (6.6B active) | 128K | $0.16 / $0.64 | 16 experts, efficient inference |
Pricing is on Azure AI Foundry. Phi-4-mini is cheaper per token than GPT-4o-mini.
Subscriptions
Microsoft’s consumer AI is Copilot, not Phi directly:
| Tier | Price | Notes |
|---|---|---|
| Copilot Free | $0 | GPT-5.x limited, Bing integration |
| Copilot Pro | $20/mo | Priority GPT-5.x, M365 integration (Word, Excel, Outlook, Teams) for M365 subscribers |
| M365 Copilot Business | $18-25/user/mo | Full M365 Copilot in all business apps, meeting recap |
| M365 Copilot Enterprise | $30/user/mo | Compliance, audit, data residency, Copilot Studio |
Copilot Pro is the ~$20 tier. It’s an M365 productivity tool, not a general chatbot – useful if you’re deep in the Microsoft ecosystem.
Free tier (API)
GitHub Models is the easiest entry point – free playground + API with rate limits (50-150 req/day). Just needs a GitHub account, no credit card. Phi-4 and Phi-4-mini both available. Paid usage mirrors Azure AI Foundry rates.
Azure free account: $200 credits for 30 days.
Open source
Yes. All Phi models are MIT license – full commercial use, modification, distribution. Weights on Hugging Face. MAI-1 (Microsoft’s ~500B frontier model) is proprietary and not publicly available.
Mobile SDK
No dedicated SDK. On-device via ONNX Runtime Mobile (iOS with CoreML, Android with NNAPI). Pre-quantized ONNX weights on Hugging Face. Phi-4-mini at 3.8B is feasible on devices with 6-8GB RAM.
Links
- Phi product page
- Azure AI pricing
- Phi-4 on Hugging Face
- GitHub Models
- GitHub Copilot plans
- Copilot Pro
Perplexity
perplexity.ai | API docs | API platform
Search-first AI. Every response is grounded in web sources with citations. The Pro subscription includes access to third-party models (GPT-5.4, Claude, Gemini).
Models
| Model | Context | Input/Output (per 1M) | Notes |
|---|---|---|---|
| Sonar | 128K | $1 / $1 + per-req fee | Lightweight search + summarization |
| Sonar Pro | 200K | $3 / $15 + per-req fee | 2x search results, deeper retrieval |
| Sonar Reasoning Pro | 128K | $2 / $8 + per-req fee | Chain-of-thought reasoning |
| Sonar Deep Research | 128K | $2 / $8 + per-req fee | Long-form synthesis, hundreds of sources |
Per-request fees: $5-14 per 1,000 requests depending on model and search context size. This hybrid pricing (tokens + requests) makes Perplexity more expensive per call than pure token-based providers.
Search API: $5 per 1,000 requests (raw web results, no generation).
Subscriptions
| Tier | Price | Key features |
|---|---|---|
| Free | $0 | Basic search, limited Pro Searches |
| Pro | $20/mo ($200/yr) | Unlimited Pro queries, file uploads, image/video gen, third-party models (GPT-5.4, Claude Sonnet/Opus, Gemini 3.1 Pro), $5/mo API credits |
| Max | $200/mo | Everything in Pro + Perplexity Computer (multi-agent), unlimited Labs, early access |
| Enterprise Pro | $40/seat/mo | Team features, SSO, shared Spaces, admin controls |
Pro subscribers get $5/month in API credits. Third-party model access in Pro means you can use GPT-5.4 and Claude Opus through Perplexity’s interface without separate subscriptions.
Free tier (API)
No free API tier. Requires payment method. Pro subscribers get $5/mo in API credits. Startup program: 6 months free Enterprise Pro + $5,000 in API credits.
Open source
No. Sonar models are proprietary (fine-tuned from Llama 3.3 70B and DeepSeek R1, but Perplexity’s weights and search infrastructure are closed).
Mobile SDK
No dedicated SDK. Official Python and TypeScript SDKs. MCP server for Cursor, VS Code, Claude Desktop. OpenAI-compatible API. iOS and Android apps available.
Links
Groq
groq.com | console.groq.com | API docs
Not a model provider – an inference provider. Runs open-source models on custom LPU hardware at the fastest speeds in the industry.
Models & pricing
| Model | Input/Output (per 1M) | Speed |
|---|---|---|
| Llama 4 Maverick | $0.20 / $0.60 | 562 t/s |
| Llama 4 Scout | $0.11 / $0.34 | 594 t/s |
| Llama 3.3 70B | $0.59 / $0.79 | 394 t/s |
| Llama 3.1 8B | $0.05 / $0.08 | 840 t/s |
| GPT-OSS 120B | $0.15 / $0.60 | 500 t/s |
| GPT-OSS 20B | $0.075 / $0.30 | 1,000 t/s |
| Qwen3 32B | $0.29 / $0.59 | 662 t/s |
| Whisper V3 Large (STT) | $0.111/hr | – |
| Whisper V3 Turbo (STT) | $0.04/hr | – |
Free tier
No credit card required. Rate-limited access (e.g., ~6K tokens/min for 70B models, ~500K tokens/day). Never charged – just 429 errors at the limit.
Paid tier (pay-as-you-go) gets 10x the rate limits plus batch processing at 50% off.
Open source
N/A – Groq doesn’t make models. They host open-source models (Llama, Qwen, GPT-OSS) on custom LPU hardware.
Links
Mobile SDK summary
| Provider | Official iOS SDK | Official Android SDK | Notes |
|---|---|---|---|
| Google Gemini | Yes (Firebase AI Logic, Swift) | Yes (Firebase AI Logic, Kotlin) | Also Flutter, React Native, Unity |
| OpenAI | No | No | Community: MacPaw/OpenAI (Swift) |
| Anthropic | No | No | Community: SwiftAnthropic |
| xAI | No | No | OpenAI-compatible API (any client works) |
| DeepSeek | No | No | OpenAI-compatible API |
| Mistral | No | No | Python/JS SDKs only |
| Meta Llama | No | No | On-device via llama.cpp, MLX, ONNX |
| Alibaba Qwen | No | No | On-device via MLX, Ollama, llama.cpp |
| Amazon Nova | Partial (Amplify AI Kit) | Partial (Amplify AI Kit) | React Native + Android. No native Swift SDK |
| Microsoft Phi | No | No | ONNX Runtime Mobile (iOS CoreML, Android NNAPI) |
| Perplexity | No | No | Python/TS SDKs, MCP server |
| Groq | No | No | OpenAI-compatible API |
Google is the only provider with first-party mobile SDKs. Amazon has partial coverage via Amplify. Everyone else: use REST directly, community packages, or on-device inference (MLX, ONNX, llama.cpp).
Bottom line
Best free chat: DeepSeek (unlimited, no sub), Qwen (full-featured, also free), or Gemini (50 daily credits). All genuinely useful without paying.
Best ~$20 subscription: Claude Pro for coding depth. ChatGPT Plus for broadest features (image, video, voice, agents). Gemini AI Pro for Workspace integration. Perplexity Pro for search + access to third-party models. Le Chat Pro is cheapest at $15 with no-telemetry.
Best free API: Gemini. No credit card, generous limits. Groq for fastest open-model inference. Qwen for 1M free tokens per model.
Best budget API: DeepSeek at $0.28/M input (or $0.028 cached). Mistral Nemo at $0.02/M for ultra-cheap. Amazon Nova Micro at $0.035/M for AWS users.
Best coding agent: Claude Code (Pro $20/mo) or Codex (Plus $20/mo) for first-party. Cursor ($20/mo) or Cline (free, BYOM) for third-party. Amazon Q Developer free tier (50 agentic requests/mo) if you’re on AWS.
Best open-source models: Llama 4 Scout (10M context), Qwen3.5 (Apache 2.0, up to 397B), DeepSeek V3.2 (MIT), Phi-4 (MIT, best small models).
Best mobile SDK: Gemini via Firebase AI Logic – only first-party option for iOS/Android/Flutter.
Highest capability: Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro are the three frontier models. Pick based on your use case and budget.
Subscriptions and APIs are separate worlds. Pick your subscription for daily use, pick your API for building things. Or just use Gemini/Qwen’s free tier for both.