Inference

Multi-provider chat completions API with OpenAI-compatible interface, billed from Conway credits.

Chat Completions

POST /v1/chat/completions

Multi-provider chat completions endpoint with an OpenAI-compatible interface. Requests are routed to the appropriate provider (OpenAI, Anthropic, Google, Moonshot, or Qwen) based on model name and billed from your Conway credits with a 1.3x markup on token cost.

All responses are returned in OpenAI-compatible format regardless of the upstream provider.

Supports streaming via Server-Sent Events (SSE).

Prerequisites

Authenticated with API key or JWT
Minimum credit balance of 10 cents

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Model name (e.g. `gpt-5.2`, `claude-sonnet-4.5`, `gemini-2.5-pro`, `kimi-k2.5`)
`messages`	array	Yes	Array of message objects (`{ role, content }`)
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`temperature`	number	No	Sampling temperature
`max_tokens`	number	No	Maximum tokens to generate

All other OpenAI-compatible parameters (tools, tool_choice, top_p, stop, etc.) are forwarded or translated as needed.

Example — OpenAI

curl -X POST https://inference.conway.tech/v1/chat/completions \
  -H "Authorization: Bearer cnwy_k_your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      { "role": "user", "content": "Hello" }
    ]
  }'

Example — Anthropic

curl -X POST https://inference.conway.tech/v1/chat/completions \
  -H "Authorization: Bearer cnwy_k_your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.5",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "max_tokens": 100
  }'

Example — Google Gemini

curl -X POST https://inference.conway.tech/v1/chat/completions \
  -H "Authorization: Bearer cnwy_k_your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [
      { "role": "user", "content": "Hello" }
    ]
  }'

Example — Kimi

curl -X POST https://inference.conway.tech/v1/chat/completions \
  -H "Authorization: Bearer cnwy_k_your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      { "role": "user", "content": "Hello" }
    ]
  }'

Response Format

All providers return responses in OpenAI-compatible format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello! How can I help?" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 7,
    "total_tokens": 15
  }
}

Streaming

Add "stream": true to the request body. Returns a stream of data: lines in SSE format, ending with data: [DONE]. Streaming is supported for all providers.

Billing

Each request is billed based on token usage:

charged_cents = ceil(token_cost_usd * 100 * 1.3)

Token cost is computed from per-model pricing (input + output tokens)
A 1.3x markup is applied
Credits are deducted after the response completes
Transactions appear in your credit history as type inference

Errors

Status	Description
`400`	Missing `model` or `messages`
`401`	Invalid or missing authentication
`402`	Insufficient credits (minimum 10 cents required)
`503`	Inference proxy not configured (missing API key for requested provider)

Supported Models

OpenAI

gpt-5.2, gpt-5.2-codex
gpt-5-mini, gpt-5-nano

Anthropic

claude-opus-4.6, claude-opus-4.5
claude-sonnet-4.5
claude-haiku-4.5

Google Gemini

gemini-2.5-pro, gemini-2.5-flash
gemini-3-pro, gemini-3-flash

Moonshot (Kimi)

kimi-k2.5

Qwen

qwen3-coder

On this page