Major frontier models compared across capability, access, and deployment dimensions
| Attribute | GPT-4o (OpenAI) | Claude 3.5/4 (Anthropic) | Gemini 1.5 Pro (Google) | Llama 3.3 70B (Meta) | Mistral Large (Mistral AI) |
|---|---|---|---|---|---|
| Access & Weights | |||||
| License | Closed API | Closed API | Closed API | Open weights | Open weights |
| Commercial use | Via API (paid) | Via API (paid) | Via API (paid) | Permissive | Permissive |
| Context & Multimodality | |||||
| Context window | 128K tokens | 200K tokens | 1M tokens | 128K tokens | 128K tokens |
| Vision / images | Yes | Yes | Yes | No | No |
| Audio / speech | Yes (native) | No | Yes (native) | No | No |
| Video input | No | No | Yes | No | No |
| Benchmark Performance | |||||
| MMLU (knowledge) | ~88% | ~88–90% | ~85% | ~82% | ~81% |
| HumanEval (code) | ~90% | ~92% | ~71% | ~72% | ~68% |
| MATH (reasoning) | ~76% | ~78% | ~67% | ~58% | ~56% |
| Strengths & Fit | |||||
| Primary strength | Multimodal, broad capability, tool use | Long docs, reasoning, safety, coding | Massive context, multimodal, search | Self-hosting, privacy, cost at scale | Efficiency, European compliance |
| Best use case | General assistant, agents, vision | Long-form analysis, code review, writing | Video/audio analysis, enterprise search | On-prem RAG, fine-tuning, cost control | EU data residency, fast inference |
| Safety alignment | RLHF | Constitutional AI | RLHF | Community | Basic |
| Cost & Inference | |||||
| Input cost (approx.) | $2.50 / 1M tok | $3.00 / 1M tok | $1.25 / 1M tok | Self-hosted ~$0.10 | $2.00 / 1M tok |
| Self-host VRAM req. | Not available | Not available | Not available | ~40 GB (bf16) | ~48 GB (bf16) |
| Fine-tuning support | Via API | Limited | Vertex AI | Full (LoRA/QLoRA) | Full (LoRA/QLoRA) |