Cloud Horizon AI / Models
Six open-weights frontier models, hosted in the EU.
No closed models. No per-token markup hidden behind a flat fee. We pick six models that cover the full workload spectrum and tell you which one to use for what.
Pick by job, not by hype
We test the catalog quarterly against a fixed set of EU-relevant tasks. Here is which model wins which job today. We rotate the list when a stronger open-weights release lands.
| Task | Best model | Why this one |
|---|---|---|
| Long-context document Q&A (>200K tokens) | Kimi K2.5 | Only model that holds the full context cleanly |
| Tool calling and structured output | GLM 4.6 | Strongest function-calling benchmarks in the catalog |
| Code generation and refactoring | Qwen 3 Coder | Specialized post-training on programming tasks |
| EU multilingual translation | MiniMax M2.5 | Glossary-aware, preserves code blocks, strong DE/FR/NL/IT |
| Cost-sensitive bulk classification | Llama 3.3 70B | Cheapest input price, solid quality at scale |
| French and German enterprise tone | Mistral Large 3 | European training data, regulated-industry phrasing |
The full lineup
Six cards. Each one is a serious working tool, not a demo curiosity. Quarterly refresh against a fixed eval set, swap-out within 30 days of a stronger release.
Kimi K2.5
kimi-k2.5Moonshot AI · Modified MIT
Context
1,000,000 tokens
Speed
~95 tok/s
$0.55 in / $2.20 out per 1M tokens
Strengths
Long-context reasoning, document analysis, multi-step planning. Strongest open-weights model for ingesting an entire codebase or RFP and reasoning across it.
Watch out
Slower than smaller models on short prompts. Overkill for simple classification.
Recommended for
- DevOps incident response with full context
- Long RFP and tender drafting
- AZ-104 and DP-203 syllabus reasoning
- Multi-document Q&A
GLM 4.6
glm-4.6Zhipu AI · MIT
Context
200,000 tokens
Speed
~120 tok/s
$0.50 in / $1.50 out per 1M tokens
Strengths
Tool calling, structured JSON output, agentic loops. Frontier-level on coding benchmarks while being half the price of Kimi.
Watch out
Smaller context than Kimi. English-language bias on some classification tasks.
Recommended for
- KQL and SQL generation
- Architecture trade-off rationale (AZ-305)
- Tool-calling agents
- KYC adverse-media classification
Qwen 3 Coder
qwen-3-coderAlibaba · Apache 2.0
Context
128,000 tokens
Speed
~140 tok/s
$0.30 in / $1.20 out per 1M tokens
Strengths
Code generation, refactoring, code review. Specialized post-training on programming benchmarks. Best in class for Terraform, Bicep, Python, TypeScript.
Watch out
Narrower scope than general models. Less strong on free-form reasoning.
Recommended for
- CI failure summarization
- PR review automation
- Lab walkthrough auto-grading (AI-102)
- Terraform from intent
MiniMax M2.5
minimax-m2.5MiniMax · Apache 2.0
Context
256,000 tokens
Speed
~110 tok/s
$0.40 in / $1.60 out per 1M tokens
Strengths
Multilingual reasoning, especially DE, FR, NL, IT, ES, PL, SV. Strong glossary-aware translation that preserves code blocks and technical terms.
Watch out
Less competitive on US English benchmarks than Kimi or GLM.
Recommended for
- Course material translation
- Multilingual onboarding flows
- MS-102 Microsoft 365 multilingual scenarios
- EU SaaS localised marketing
Llama 3.3 70B
llama-3.3-70bMeta · Llama 3 Community License
Context
128,000 tokens
Speed
~160 tok/s
$0.20 in / $0.80 out per 1M tokens
Strengths
The price-performance default. Cheapest model in the catalog. Solid general-purpose reasoning, strong instruction following.
Watch out
Smaller scale than Kimi for long-context. Beaten by GLM on tool calling.
Recommended for
- Customer support copilots at volume
- Bulk content classification
- In-product Q&A on indexed help content
- Cost-sensitive workloads
Mistral Large 3
mistral-large-3Mistral AI · Mistral Research / Commercial
Context
256,000 tokens
Speed
~100 tok/s
$0.70 in / $2.10 out per 1M tokens
Strengths
European-trained, strong on French, German, Italian, Spanish. The model your French clients will recognise by name. Strong on regulated-industry phrasing.
Watch out
Pricier than Llama for similar tasks. License has commercial-use restrictions worth checking.
Recommended for
- French and Italian regulated content
- EU AI Act compliance drafting
- Public-sector tender drafting (FR)
- Brand-safe European copy
Choosing the right model
Default to Llama 3.3 70B, upgrade when you need to
For most workloads at most price points, Llama 3.3 70B is the sensible default. Move to Kimi when context exceeds 100K tokens. Move to GLM for tool-calling agents. Move to Qwen for code. Move to MiniMax for multilingual. Move to Mistral when your French enterprise client asks who built the model.
See the model docs