Cloud Horizon AI / Models

Six open-weights frontier models, hosted in the EU.

No closed models. No per-token markup hidden behind a flat fee. We pick six models that cover the full workload spectrum and tell you which one to use for what.

Join the waitlist Read the docs

Pick by job, not by hype

We test the catalog quarterly against a fixed set of EU-relevant tasks. Here is which model wins which job today. We rotate the list when a stronger open-weights release lands.

Task	Best model	Why this one
Long-context document Q&A (>200K tokens)	Kimi K2.5	Only model that holds the full context cleanly
Tool calling and structured output	GLM 4.6	Strongest function-calling benchmarks in the catalog
Code generation and refactoring	Qwen 3 Coder	Specialized post-training on programming tasks
EU multilingual translation	MiniMax M2.5	Glossary-aware, preserves code blocks, strong DE/FR/NL/IT
Cost-sensitive bulk classification	Llama 3.3 70B	Cheapest input price, solid quality at scale
French and German enterprise tone	Mistral Large 3	European training data, regulated-industry phrasing

The full lineup

Six cards. Each one is a serious working tool, not a demo curiosity. Quarterly refresh against a fixed eval set, swap-out within 30 days of a stronger release.

Kimi K2.5

kimi-k2.5

Moonshot AI · Modified MIT

Context

1,000,000 tokens

Speed

~95 tok/s

$0.55 in / $2.20 out per 1M tokens

Strengths
Long-context reasoning, document analysis, multi-step planning. Strongest open-weights model for ingesting an entire codebase or RFP and reasoning across it.

Watch out
Slower than smaller models on short prompts. Overkill for simple classification.

Recommended for

DevOps incident response with full context
Long RFP and tender drafting
AZ-104 and DP-203 syllabus reasoning
Multi-document Q&A

GLM 4.6

glm-4.6

Zhipu AI · MIT

Context

200,000 tokens

Speed

~120 tok/s

$0.50 in / $1.50 out per 1M tokens

Strengths
Tool calling, structured JSON output, agentic loops. Frontier-level on coding benchmarks while being half the price of Kimi.

Watch out
Smaller context than Kimi. English-language bias on some classification tasks.

Recommended for

KQL and SQL generation
Architecture trade-off rationale (AZ-305)
Tool-calling agents
KYC adverse-media classification

Qwen 3 Coder

qwen-3-coder

Alibaba · Apache 2.0

Context

128,000 tokens

Speed

~140 tok/s

$0.30 in / $1.20 out per 1M tokens

Strengths
Code generation, refactoring, code review. Specialized post-training on programming benchmarks. Best in class for Terraform, Bicep, Python, TypeScript.

Watch out
Narrower scope than general models. Less strong on free-form reasoning.

Recommended for

CI failure summarization
PR review automation
Lab walkthrough auto-grading (AI-102)
Terraform from intent

MiniMax M2.5

minimax-m2.5

MiniMax · Apache 2.0

Context

256,000 tokens

Speed

~110 tok/s

$0.40 in / $1.60 out per 1M tokens

Strengths
Multilingual reasoning, especially DE, FR, NL, IT, ES, PL, SV. Strong glossary-aware translation that preserves code blocks and technical terms.

Watch out
Less competitive on US English benchmarks than Kimi or GLM.

Recommended for

Course material translation
Multilingual onboarding flows
MS-102 Microsoft 365 multilingual scenarios
EU SaaS localised marketing

Llama 3.3 70B

llama-3.3-70b

Meta · Llama 3 Community License

Context

128,000 tokens

Speed

~160 tok/s

$0.20 in / $0.80 out per 1M tokens

Strengths
The price-performance default. Cheapest model in the catalog. Solid general-purpose reasoning, strong instruction following.

Watch out
Smaller scale than Kimi for long-context. Beaten by GLM on tool calling.

Recommended for

Customer support copilots at volume
Bulk content classification
In-product Q&A on indexed help content
Cost-sensitive workloads

Mistral Large 3

mistral-large-3

Mistral AI · Mistral Research / Commercial

Context

256,000 tokens

Speed

~100 tok/s

$0.70 in / $2.10 out per 1M tokens

Strengths
European-trained, strong on French, German, Italian, Spanish. The model your French clients will recognise by name. Strong on regulated-industry phrasing.

Watch out
Pricier than Llama for similar tasks. License has commercial-use restrictions worth checking.

Recommended for

French and Italian regulated content
EU AI Act compliance drafting
Public-sector tender drafting (FR)
Brand-safe European copy

Choosing the right model

Default to Llama 3.3 70B, upgrade when you need to

For most workloads at most price points, Llama 3.3 70B is the sensible default. Move to Kimi when context exceeds 100K tokens. Move to GLM for tool-calling agents. Move to Qwen for code. Move to MiniMax for multilingual. Move to Mistral when your French enterprise client asks who built the model.

See the model docs