Cloud Horizon Get the free audit

Cloud Horizon AI / Docs

API reference

OpenAI-compatible LLM API served from EU data centers. Drop-in for the OpenAI SDK, swap the base URL. This page covers everything you need to ship a production integration.

Quickstart

Three steps. Get an API key from your dashboard, set it as an environment variable, send a request.

  1. 1 Sign up at cloud-horizons.com/ai and generate a key.
  2. 2 Export it: export CLOUD_HORIZONS_KEY=ch-...
  3. 3 Send a request:
curl https://api.cloudhorizons.ai/v1/chat/completions \
  -H "Authorization: Bearer $CLOUD_HORIZONS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      {"role": "user", "content": "Hello, in two lines."}
    ]
  }'

Authentication

Bearer token in the Authorization header. Same shape as OpenAI.

Authorization: Bearer ch-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keys are scoped per project. Rotate from the dashboard, old keys are revoked instantly. The team plan supports SSO and per-seat keys.

Endpoints

Stable v1 surface. We mirror the OpenAI Chat Completions, Embeddings, and Models endpoints. The Responses API arrives later in 2026.

Method Path Purpose
POST /v1/chat/completions Generate a chat completion. OpenAI-compatible.
POST /v1/embeddings Vector embeddings for retrieval and clustering.
GET /v1/models List available models with their context windows.
POST /v1/moderations Classify content for safety policies.

Models

Pass the model slug in the request body. Aliases like latest-reasoning and latest-coder point at the current best model in each category.

  • kimi-k2.5 — long-doc reasoning, 200K context
  • minimax-m2.5 — multilingual chat, agent loops
  • glm-4.6 — tool use, structured output
  • qwen-3-coder — pure code generation
  • llama-3.3-70b — general-purpose chat
  • mistral-large-3 — European language fluency
  • bge-m3 — embeddings, 1024 dims

Streaming

Set stream: true and we emit Server-Sent Events with the same delta shape OpenAI uses. The Python and Node SDKs handle the parsing for you.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cloudhorizons.ai/v1",
    api_key="ch-...",
)

stream = client.chat.completions.create(
    model="glm-4.6",
    messages=[{"role": "user", "content": "Stream a haiku"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Tool calling

Define tools as JSON schemas, the model returns a tool call when it needs one. Same surface as OpenAI's tools parameter.

tools = [{
    "type": "function",
    "function": {
        "name": "get_alerts",
        "description": "Fetch active PagerDuty incidents",
        "parameters": {
            "type": "object",
            "properties": {
                "service_id": {"type": "string"},
                "since": {"type": "string", "format": "date-time"},
            },
            "required": ["service_id"],
        },
    },
}]

resp = client.chat.completions.create(
    model="glm-4.6",
    messages=[{"role": "user", "content": "Any active alerts on api-gateway since 1pm?"}],
    tools=tools,
)

call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)

Embeddings

BGE-M3 by default. 1024 dimensions, multilingual, strong on retrieval. Pass a list of strings, get a list of vectors back.

resp = client.embeddings.create(
    model="bge-m3",
    input=["EU AI Act compliance", "Schrems II ruling"],
)

for emb in resp.data:
    print(len(emb.embedding))  # 1024 dimensions

Errors

JSON body, OpenAI-shaped. Type field is the stable contract, message is human-readable and may change.

{
  "error": {
    "type": "rate_limited",
    "message": "Per-minute token limit reached. Retry after 12s.",
    "retry_after_seconds": 12
  }
}
Status Type When it happens
400 invalid_request Malformed JSON, unsupported model, missing field
401 unauthorized API key missing or revoked
402 quota_exceeded Plan quota hit, top up or upgrade
413 context_too_long Input exceeds the model context window
429 rate_limited Per-minute or burst limit hit, retry with backoff
500 upstream_error Inference backend unhealthy, retried with backoff
503 capacity_exhausted No GPU capacity in the region right now

Rate limits

Limits are per project. We return 429 with a retry_after_seconds field, the SDKs honor it automatically.

Plan Requests / min Tokens / min Burst
Personal 120 req/min 500K tok/min 10 concurrent
Team 600 req/min 4M tok/min 40 concurrent
Custom Negotiated Negotiated Negotiated

Data handling

Your prompts and completions are not used to train any model. Inference happens in EU regions only. Logs are kept for debugging and abuse detection, default 7 days, configurable to 0 days on the team plan. Bring your own key for total isolation.

Standard EU Data Processing Agreement available on team plans. Sub-processors disclosed at /ai/changelog.

SDKs

We do not ship our own SDK. The OpenAI clients work as-is, override the base URL.

  • Python: openai v1+, set base_url
  • Node: openai v4+, set baseURL
  • Go: openai-go with custom option.WithBaseURL
  • LangChain: ChatOpenAI(base_url=...)
  • LiteLLM: route as openai/<model> with api_base
  • Vercel AI SDK: custom OpenAI provider, set baseURL

Next

Join the private beta

Waitlist members get a year of the personal plan at half price and first access when keys ship.

Join waitlist