Quickstart
Three steps. Get an API key from your dashboard, set it as an environment variable, send a request.
- 1 Sign up at cloud-horizons.com/ai and generate a key.
- 2 Export it:
export CLOUD_HORIZONS_KEY=ch-... - 3 Send a request:
curl https://api.cloudhorizons.ai/v1/chat/completions \
-H "Authorization: Bearer $CLOUD_HORIZONS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"messages": [
{"role": "user", "content": "Hello, in two lines."}
]
}' Authentication
Bearer token in the Authorization header. Same shape as OpenAI.
Authorization: Bearer ch-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Keys are scoped per project. Rotate from the dashboard, old keys are revoked instantly. The team plan supports SSO and per-seat keys.
Endpoints
Stable v1 surface. We mirror the OpenAI Chat Completions, Embeddings, and Models endpoints. The Responses API arrives later in 2026.
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/chat/completions | Generate a chat completion. OpenAI-compatible. |
| POST | /v1/embeddings | Vector embeddings for retrieval and clustering. |
| GET | /v1/models | List available models with their context windows. |
| POST | /v1/moderations | Classify content for safety policies. |
Models
Pass the model slug in the request body. Aliases like
latest-reasoning
and
latest-coder
point at the current best model in each category.
kimi-k2.5— long-doc reasoning, 200K contextminimax-m2.5— multilingual chat, agent loopsglm-4.6— tool use, structured outputqwen-3-coder— pure code generationllama-3.3-70b— general-purpose chatmistral-large-3— European language fluencybge-m3— embeddings, 1024 dims
Streaming
Set stream: true
and we emit Server-Sent Events with the same delta shape OpenAI uses.
The Python and Node SDKs handle the parsing for you.
from openai import OpenAI
client = OpenAI(
base_url="https://api.cloudhorizons.ai/v1",
api_key="ch-...",
)
stream = client.chat.completions.create(
model="glm-4.6",
messages=[{"role": "user", "content": "Stream a haiku"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True) Tool calling
Define tools as JSON schemas, the model returns a tool call when it
needs one. Same surface as OpenAI's
tools
parameter.
tools = [{
"type": "function",
"function": {
"name": "get_alerts",
"description": "Fetch active PagerDuty incidents",
"parameters": {
"type": "object",
"properties": {
"service_id": {"type": "string"},
"since": {"type": "string", "format": "date-time"},
},
"required": ["service_id"],
},
},
}]
resp = client.chat.completions.create(
model="glm-4.6",
messages=[{"role": "user", "content": "Any active alerts on api-gateway since 1pm?"}],
tools=tools,
)
call = resp.choices[0].message.tool_calls[0]
print(call.function.name, call.function.arguments) Embeddings
BGE-M3 by default. 1024 dimensions, multilingual, strong on retrieval. Pass a list of strings, get a list of vectors back.
resp = client.embeddings.create(
model="bge-m3",
input=["EU AI Act compliance", "Schrems II ruling"],
)
for emb in resp.data:
print(len(emb.embedding)) # 1024 dimensions Errors
JSON body, OpenAI-shaped. Type field is the stable contract, message is human-readable and may change.
{
"error": {
"type": "rate_limited",
"message": "Per-minute token limit reached. Retry after 12s.",
"retry_after_seconds": 12
}
} | Status | Type | When it happens |
|---|---|---|
| 400 | invalid_request | Malformed JSON, unsupported model, missing field |
| 401 | unauthorized | API key missing or revoked |
| 402 | quota_exceeded | Plan quota hit, top up or upgrade |
| 413 | context_too_long | Input exceeds the model context window |
| 429 | rate_limited | Per-minute or burst limit hit, retry with backoff |
| 500 | upstream_error | Inference backend unhealthy, retried with backoff |
| 503 | capacity_exhausted | No GPU capacity in the region right now |
Rate limits
Limits are per project. We return
429
with a
retry_after_seconds
field, the SDKs honor it automatically.
| Plan | Requests / min | Tokens / min | Burst |
|---|---|---|---|
| Personal | 120 req/min | 500K tok/min | 10 concurrent |
| Team | 600 req/min | 4M tok/min | 40 concurrent |
| Custom | Negotiated | Negotiated | Negotiated |
Data handling
Your prompts and completions are not used to train any model. Inference happens in EU regions only. Logs are kept for debugging and abuse detection, default 7 days, configurable to 0 days on the team plan. Bring your own key for total isolation.
Standard EU Data Processing Agreement available on team plans. Sub-processors disclosed at /ai/changelog.
SDKs
We do not ship our own SDK. The OpenAI clients work as-is, override the base URL.
- Python:
openaiv1+, setbase_url - Node:
openaiv4+, setbaseURL - Go:
openai-gowith customoption.WithBaseURL - LangChain:
ChatOpenAI(base_url=...) - LiteLLM: route as
openai/<model>withapi_base - Vercel AI SDK: custom OpenAI provider, set
baseURL
Next
Join the private beta
Waitlist members get a year of the personal plan at half price and first access when keys ship.
Join waitlist