Cloud Horizon Get the free audit

Cloud Horizon AI / Models

Six open-weights frontier models, hosted in the EU.

No closed models. No per-token markup hidden behind a flat fee. We pick six models that cover the full workload spectrum and tell you which one to use for what.

Pick by job, not by hype

We test the catalog quarterly against a fixed set of EU-relevant tasks. Here is which model wins which job today. We rotate the list when a stronger open-weights release lands.

Task Best model Why this one
Long-context document Q&A (>200K tokens) Kimi K2.5 Only model that holds the full context cleanly
Tool calling and structured output GLM 4.6 Strongest function-calling benchmarks in the catalog
Code generation and refactoring Qwen 3 Coder Specialized post-training on programming tasks
EU multilingual translation MiniMax M2.5 Glossary-aware, preserves code blocks, strong DE/FR/NL/IT
Cost-sensitive bulk classification Llama 3.3 70B Cheapest input price, solid quality at scale
French and German enterprise tone Mistral Large 3 European training data, regulated-industry phrasing

The full lineup

Six cards. Each one is a serious working tool, not a demo curiosity. Quarterly refresh against a fixed eval set, swap-out within 30 days of a stronger release.

Kimi K2.5

kimi-k2.5

Moonshot AI · Modified MIT

Context

1,000,000 tokens

Speed

~95 tok/s

$0.55 in / $2.20 out per 1M tokens

Strengths
Long-context reasoning, document analysis, multi-step planning. Strongest open-weights model for ingesting an entire codebase or RFP and reasoning across it.

Watch out
Slower than smaller models on short prompts. Overkill for simple classification.

Recommended for

  • DevOps incident response with full context
  • Long RFP and tender drafting
  • AZ-104 and DP-203 syllabus reasoning
  • Multi-document Q&A

GLM 4.6

glm-4.6

Zhipu AI · MIT

Context

200,000 tokens

Speed

~120 tok/s

$0.50 in / $1.50 out per 1M tokens

Strengths
Tool calling, structured JSON output, agentic loops. Frontier-level on coding benchmarks while being half the price of Kimi.

Watch out
Smaller context than Kimi. English-language bias on some classification tasks.

Recommended for

  • KQL and SQL generation
  • Architecture trade-off rationale (AZ-305)
  • Tool-calling agents
  • KYC adverse-media classification

Qwen 3 Coder

qwen-3-coder

Alibaba · Apache 2.0

Context

128,000 tokens

Speed

~140 tok/s

$0.30 in / $1.20 out per 1M tokens

Strengths
Code generation, refactoring, code review. Specialized post-training on programming benchmarks. Best in class for Terraform, Bicep, Python, TypeScript.

Watch out
Narrower scope than general models. Less strong on free-form reasoning.

Recommended for

  • CI failure summarization
  • PR review automation
  • Lab walkthrough auto-grading (AI-102)
  • Terraform from intent

MiniMax M2.5

minimax-m2.5

MiniMax · Apache 2.0

Context

256,000 tokens

Speed

~110 tok/s

$0.40 in / $1.60 out per 1M tokens

Strengths
Multilingual reasoning, especially DE, FR, NL, IT, ES, PL, SV. Strong glossary-aware translation that preserves code blocks and technical terms.

Watch out
Less competitive on US English benchmarks than Kimi or GLM.

Recommended for

  • Course material translation
  • Multilingual onboarding flows
  • MS-102 Microsoft 365 multilingual scenarios
  • EU SaaS localised marketing

Llama 3.3 70B

llama-3.3-70b

Meta · Llama 3 Community License

Context

128,000 tokens

Speed

~160 tok/s

$0.20 in / $0.80 out per 1M tokens

Strengths
The price-performance default. Cheapest model in the catalog. Solid general-purpose reasoning, strong instruction following.

Watch out
Smaller scale than Kimi for long-context. Beaten by GLM on tool calling.

Recommended for

  • Customer support copilots at volume
  • Bulk content classification
  • In-product Q&A on indexed help content
  • Cost-sensitive workloads

Mistral Large 3

mistral-large-3

Mistral AI · Mistral Research / Commercial

Context

256,000 tokens

Speed

~100 tok/s

$0.70 in / $2.10 out per 1M tokens

Strengths
European-trained, strong on French, German, Italian, Spanish. The model your French clients will recognise by name. Strong on regulated-industry phrasing.

Watch out
Pricier than Llama for similar tasks. License has commercial-use restrictions worth checking.

Recommended for

  • French and Italian regulated content
  • EU AI Act compliance drafting
  • Public-sector tender drafting (FR)
  • Brand-safe European copy

Choosing the right model

Default to Llama 3.3 70B, upgrade when you need to

For most workloads at most price points, Llama 3.3 70B is the sensible default. Move to Kimi when context exceeds 100K tokens. Move to GLM for tool-calling agents. Move to Qwen for code. Move to MiniMax for multilingual. Move to Mistral when your French enterprise client asks who built the model.

See the model docs