Cloud Horizon AI docs: getting started, endpoints, errors, SDKs

Quickstart

Three steps. Get an API key from your dashboard, set it as an environment variable, send a request.

1 Sign up at cloud-horizons.com/ai and generate a key.
2 Export it: export CLOUD_HORIZONS_KEY=ch-...
3 Send a request:

curl https://api.cloudhorizons.ai/v1/chat/completions \
  -H "Authorization: Bearer $CLOUD_HORIZONS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      {"role": "user", "content": "Hello, in two lines."}
    ]
  }'

Authentication

Bearer token in the Authorization header. Same shape as OpenAI.

Authorization: Bearer ch-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keys are scoped per project. Rotate from the dashboard, old keys are revoked instantly. The team plan supports SSO and per-seat keys.

Endpoints

Stable v1 surface. We mirror the OpenAI Chat Completions, Embeddings, and Models endpoints. The Responses API arrives later in 2026.

Method	Path	Purpose
POST	/v1/chat/completions	Generate a chat completion. OpenAI-compatible.
POST	/v1/embeddings	Vector embeddings for retrieval and clustering.
GET	/v1/models	List available models with their context windows.
POST	/v1/moderations	Classify content for safety policies.

Models

Pass the model slug in the request body. Aliases like latest-reasoning and latest-coder point at the current best model in each category.

kimi-k2.5 — long-doc reasoning, 200K context
minimax-m2.5 — multilingual chat, agent loops
glm-4.6 — tool use, structured output
qwen-3-coder — pure code generation
llama-3.3-70b — general-purpose chat
mistral-large-3 — European language fluency
bge-m3 — embeddings, 1024 dims

Streaming

Set stream: true and we emit Server-Sent Events with the same delta shape OpenAI uses. The Python and Node SDKs handle the parsing for you.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cloudhorizons.ai/v1",
    api_key="ch-...",
)

stream = client.chat.completions.create(
    model="glm-4.6",
    messages=[{"role": "user", "content": "Stream a haiku"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Tool calling

Define tools as JSON schemas, the model returns a tool call when it needs one. Same surface as OpenAI's tools parameter.

tools = [{
    "type": "function",
    "function": {
        "name": "get_alerts",
        "description": "Fetch active PagerDuty incidents",
        "parameters": {
            "type": "object",
            "properties": {
                "service_id": {"type": "string"},
                "since": {"type": "string", "format": "date-time"},
            },
            "required": ["service_id"],
        },
    },
}]

resp = client.chat.completions.create(
    model="glm-4.6",
    messages=[{"role": "user", "content": "Any active alerts on api-gateway since 1pm?"}],
    tools=tools,
)

call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments)

Embeddings

BGE-M3 by default. 1024 dimensions, multilingual, strong on retrieval. Pass a list of strings, get a list of vectors back.

resp = client.embeddings.create(
    model="bge-m3",
    input=["EU AI Act compliance", "Schrems II ruling"],
)

for emb in resp.data:
    print(len(emb.embedding))  # 1024 dimensions

Errors

JSON body, OpenAI-shaped. Type field is the stable contract, message is human-readable and may change.

{
  "error": {
    "type": "rate_limited",
    "message": "Per-minute token limit reached. Retry after 12s.",
    "retry_after_seconds": 12
  }
}

Status	Type	When it happens
400	invalid_request	Malformed JSON, unsupported model, missing field
401	unauthorized	API key missing or revoked
402	quota_exceeded	Plan quota hit, top up or upgrade
413	context_too_long	Input exceeds the model context window
429	rate_limited	Per-minute or burst limit hit, retry with backoff
500	upstream_error	Inference backend unhealthy, retried with backoff
503	capacity_exhausted	No GPU capacity in the region right now

Rate limits

Limits are per project. We return 429 with a retry_after_seconds field, the SDKs honor it automatically.

Plan	Requests / min	Tokens / min	Burst
Personal	120 req/min	500K tok/min	10 concurrent
Team	600 req/min	4M tok/min	40 concurrent
Custom	Negotiated	Negotiated	Negotiated

Data handling

Your prompts and completions are not used to train any model. Inference happens in EU regions only. Logs are kept for debugging and abuse detection, default 7 days, configurable to 0 days on the team plan. Bring your own key for total isolation.

Standard EU Data Processing Agreement available on team plans. Sub-processors disclosed at /ai/changelog.

SDKs

We do not ship our own SDK. The OpenAI clients work as-is, override the base URL.

Python: openai v1+, set base_url
Node: openai v4+, set baseURL
Go: openai-go with custom option.WithBaseURL
LangChain: ChatOpenAI(base_url=...)
LiteLLM: route as openai/<model> with api_base
Vercel AI SDK: custom OpenAI provider, set baseURL

Join the private beta

Waitlist members get a year of the personal plan at half price and first access when keys ship.

Join waitlist

API reference