API Documentation

ApiTopMix is a unified AI model aggregation gateway. Access GPT, Claude, Gemini, DeepSeek, and more through a single, OpenAI-compatible API.

Getting Started

Welcome to the ApiTopMix API. Our gateway provides a single, unified interface to interact with the world's leading AI models. The API is fully compatible with the OpenAI SDK, so you can switch with a single line change.

Base URL

BASE https://apitopmix.com

Quick Start

Get up and running in under a minute:

Sign up at apitopmix.com and generate an API key from your dashboard.
Set the base URL to https://apitopmix.com and add your API key to the Authorization header.
Make your first request using any OpenAI-compatible SDK or a simple cURL call.

Quick Test (cURL)

curl https://apitopmix.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

⚡

OpenAI SDK Compatible — Simply change the base_url to https://apitopmix.com/v1 in any OpenAI SDK client. No other code changes needed.

ℹ

Anthropic SDK Compatible — For Claude models, you can also use the native Anthropic Messages API format via /v1/messages. Set base_url to https://apitopmix.com in the Anthropic SDK. See Claude Messages API for details.

Authentication

All API requests require authentication via a Bearer token in the Authorization header.

Authorization Header

Authorization: Bearer sk-your-api-key-here

Header	Value	Required
`Authorization`	`Bearer YOUR_API_KEY`	Required
`Content-Type`	`application/json`	Required

⚠

Keep your API key secret. Never expose it in client-side code or public repositories. Use environment variables to store your key securely.

Chat Completions

Create a chat completion with any supported model. This endpoint is compatible with the OpenAI Chat Completions API.

POST /v1/chat/completions

Request Body

Parameter	Type	Description
`model`	string	Model ID to use, e.g. `gpt-4.1`, `claude-sonnet-4` Required
`messages`	array	Array of message objects with `role` and `content` Required
`max_tokens`	integer	Maximum number of tokens to generate Optional
`temperature`	number	Sampling temperature between 0 and 2. Default: `1.0` Optional
`stream`	boolean	Enable Server-Sent Events streaming. Default: `false` Optional
`top_p`	number	Nucleus sampling parameter. Default: `1.0` Optional

Message Object

Field	Type	Description
`role`	string	One of `system`, `user`, or `assistant`
`content`	string	The text content of the message

Response Format

JSON Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1711234567,
  "model": "gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Code Examples

cURL

Python

JavaScript

cURL

curl https://apitopmix.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "max_tokens": 500,
    "temperature": 0.7
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://apitopmix.com/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing."}
    ],
    max_tokens=500,
    temperature=0.7
)

print(response.choices[0].message.content)

JavaScript (Node.js)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://apitopmix.com/v1'
});

const response = await client.chat.completions.create({
  model: 'gemini-3.1-pro-preview',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing.' }
  ],
  max_tokens: 500,
  temperature: 0.7
});

console.log(response.choices[0].message.content);

Streaming (SSE)

Enable real-time token streaming by setting stream: true. The response is delivered as Server-Sent Events.

Python Streaming Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://apitopmix.com/v1"
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Write a poem about AI."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Each SSE event contains a JSON chunk:

SSE Chunk Format

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}

data: [DONE]

Claude Messages API

In addition to the OpenAI-compatible endpoint above, ApiTopMix also natively supports the Anthropic Messages API format. If you are already using the Anthropic SDK, you can connect directly without any format conversion.

ℹ

Two ways to use Claude models: You can call Claude models via the OpenAI-compatible /v1/chat/completions endpoint (shown above), or via the native Anthropic /v1/messages endpoint described here. Both use the same API key and same pricing.

POST /v1/messages

Authentication

The Anthropic format uses the x-api-key header (instead of Authorization: Bearer). You must also include the anthropic-version header.

Header	Value	Required
`x-api-key`	`YOUR_API_KEY`	Required
`anthropic-version`	`2023-06-01`	Required
`Content-Type`	`application/json`	Required

Request Body

Parameter	Type	Description
`model`	string	Claude model ID, e.g. `claude-sonnet-4-6` Required
`messages`	array	Array of message objects with `role` and `content` Required
`max_tokens`	integer	Maximum number of tokens to generate Required
`system`	string	System prompt (passed as a top-level field, not inside messages) Optional
`temperature`	number	Sampling temperature between 0 and 1. Default: `1.0` Optional
`stream`	boolean	Enable Server-Sent Events streaming. Default: `false` Optional
`top_p`	number	Nucleus sampling parameter Optional

Supported Models

Model ID	Description
`claude-opus-4-6`	Most capable Claude model
`claude-opus-4-6-thinking`	Opus with extended thinking
`claude-opus-4-5-20251101`	Claude Opus 4.5
`claude-opus-4-5-20251101-thinking`	Opus 4.5 with extended thinking
`claude-sonnet-4-6`	Best balance of speed and capability
`claude-sonnet-4-6-thinking`	Sonnet with extended thinking
`claude-sonnet-4-5-20250929`	Claude Sonnet 4.5
`claude-sonnet-4-5-20250929-thinking`	Sonnet 4.5 with extended thinking
`claude-haiku-4-5-20251001`	Fastest and most affordable Claude model
`claude-haiku-4-5-20251001-thinking`	Haiku with extended thinking

Response Format

JSON Response

{
  "id": "msg_01abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 9
  }
}

Code Examples

cURL

Python

JavaScript

cURL

curl https://apitopmix.com/v1/messages \
  -H "x-api-key: $API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 500,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Python (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-api-key",
    base_url="https://apitopmix.com"
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=500,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Explain quantum computing."}
    ]
)

print(message.content[0].text)

JavaScript (Anthropic SDK)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://apitopmix.com'
});

const message = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 500,
  system: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'Explain quantum computing.' }
  ]
});

console.log(message.content[0].text);

Streaming

Enable real-time streaming by setting stream: true.

Python Streaming

import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-api-key",
    base_url="https://apitopmix.com"
)

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=500,
    messages=[{"role": "user", "content": "Write a poem about AI."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="")

⚡

Anthropic SDK Compatible — Simply set base_url to https://apitopmix.com (without /v1) in the Anthropic SDK client. The SDK appends /v1/messages automatically.

ℹ

Key differences from OpenAI format:

Auth header: x-api-key instead of Authorization: Bearer
System prompt: top-level system field, not a message with role: "system"
Response: content is an array of content blocks, not a single string
max_tokens is required, not optional
Usage fields: input_tokens / output_tokens instead of prompt_tokens / completion_tokens

Models

List all available models through the API. ApiTopMix aggregates models from leading AI providers.

GET /v1/models

cURL

curl https://apitopmix.com/v1/models \
  -H "Authorization: Bearer $API_KEY"

Supported Providers & Models

Anthropic

Claude Series

Opus 4.6 / 4.5, Sonnet 4.6 / 4.5, Haiku 4.5
Includes standard and thinking modes

Google

Gemini Series

Gemini 3.1 Pro, Gemini 3 Flash
Latest Gemini multimodal models

DeepSeek

V3.2, V3.1, R1 Distill series
Via NVIDIA NIM

Llama Series

Llama 4 Maverick/Scout, Llama 3.3 70B, Llama 3.1 405B
Free via NVIDIA NIM

Mistral

Mistral Series

Large 3 675B, Medium 3, Small 4, Devstral 2
Via NVIDIA NIM

Moonshot

Kimi Series

Kimi K2.5, K2 Instruct, K2 Thinking
Via NVIDIA NIM

Qwen

Qwen Series

Qwen 3.5 397B, Qwen 3 Coder 480B, QwQ 32B
Via NVIDIA NIM

Others

More Models

GLM-5, MiniMax M2.5, Phi-4, GPT-OSS, Nemotron Ultra
50+ models available

Pricing

Model	Input $/1M	Output $/1M	Notes
Anthropic Claude (Official 60% off)
`claude-opus-4-6`	$3.00	$15.00
`claude-opus-4-6-thinking`	$3.00	$15.00	Extended thinking
`claude-sonnet-4-6`	$1.80	$9.00
`claude-sonnet-4-6-thinking`	$1.80	$9.00	Extended thinking
`claude-haiku-4-5-20251001`	$0.60	$3.00
Google Gemini (Official 60% off)
`gemini-3.1-pro-preview`	$1.20	$7.20	Latest Gemini Pro
`gemini-3-flash-preview`	$0.30	$1.80	Fast & affordable
NVIDIA NIM Models (Official 20% off / Free)
`meta/llama-3.3-70b-instruct`	FREE		Meta Llama 3.3
`meta/llama-3.1-405b-instruct`	FREE		405B parameters
`qwen/qwen3.5-397b-a17b`	$0.02	$0.06	Qwen 3.5
`mistralai/mistral-large-3-675b-instruct-2512`	$0.10	$0.30	Mistral Large 3
`moonshotai/kimi-k2.5`	$0.12	$0.60	Kimi K2.5
`z-ai/glm5`	$0.20	$0.64	GLM-5

ℹ

50+ models available. Use the /v1/models endpoint to get the complete list of model IDs. Pricing for all models is available on the Pricing page (login required).

Embeddings

Generate vector embeddings for text inputs. Useful for search, clustering, and retrieval-augmented generation (RAG).

POST /v1/embeddings

Request Body

Parameter	Type	Description
`model`	string	Embedding model ID Required
`input`	string \| array	Text(s) to embed Required
`encoding_format`	string	`float` or `base64`. Default: `float` Optional

Supported Models

Model	Dimensions	Best For
`text-embedding-3-large`	3072	Highest accuracy, retrieval tasks
`text-embedding-3-small`	1536	Balanced performance and cost
`text-embedding-ada-002`	1536	Legacy support

Python Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://apitopmix.com/v1"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog."
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536

Response Format

JSON Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 9, "total_tokens": 9 }
}

Music Generation (Suno API)

Generate music tracks using AI. Submit a music generation task and poll for results.

Submit Music Generation

POST /suno/submit/music

Request Body

Parameter	Type	Description
`prompt`	string	Text prompt describing the desired music Required
`style`	string	Music genre/style, e.g. `"jazz"`, `"electronic"`, `"pop"` Optional
`title`	string	Title for the generated track Optional
`lyrics`	string	Custom lyrics for the track Optional
`make_instrumental`	boolean	Generate without vocals. Default: `false` Optional

cURL - Submit Music

curl -X POST https://apitopmix.com/suno/submit/music \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A dreamy lo-fi beat with soft piano and rain sounds",
    "style": "lo-fi",
    "title": "Rainy Afternoon",
    "make_instrumental": true
  }'

Submit Response

JSON Response

{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "suno_abc123def456"
  }
}

Fetch Generation Result

GET /suno/fetch/{id}

Poll this endpoint with the task_id to check generation progress and retrieve the result.

cURL - Fetch Result

curl https://apitopmix.com/suno/fetch/suno_abc123def456 \
  -H "Authorization: Bearer $API_KEY"

JSON Response (Completed)

{
  "code": 200,
  "data": {
    "task_id": "suno_abc123def456",
    "status": "completed",
    "tracks": [
      {
        "title": "Rainy Afternoon",
        "audio_url": "https://cdn.apitopmix.com/audio/...",
        "duration": 120,
        "style": "lo-fi"
      }
    ]
  }
}

ℹ

Async Processing — Music generation is asynchronous. Typical generation takes 30–120 seconds. Poll the fetch endpoint every few seconds until status is "completed" or "failed".

Image Generation

Generate high-quality images from text prompts using AI. Compatible with the OpenAI DALL-E API format.

POST /v1/images/generations

Request Body

Parameter	Type	Description
`model`	string	`"nano-banana-2"` Required
`prompt`	string	Text description of the image to generate Required
`n`	integer	Number of images (default: 1) Optional
`size`	string	Image size, e.g. `"1024x1024"` Optional
`image_size`	string	Resolution: `"1K"`, `"2K"`, or `"4K"`. Default: 1K Optional

Supported Models

Model	Price	Features
`nano-banana-2`	$0.082/image	Text-to-image, image-to-image, multi-image input. Supports 1K/2K/4K output.

cURL

curl -X POST https://apitopmix.com/v1/images/generations \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nano-banana-2",
    "prompt": "A majestic snow leopard on a mountain peak at sunrise, digital painting",
    "n": 1,
    "size": "1024x1024"
  }'

Response

{
  "created": 1711234567,
  "data": [
    {
      "url": "https://example.com/generated-image.jpg"
    }
  ]
}

Python Example

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://apitopmix.com/v1"
)

response = client.images.generate(
    model="nano-banana-2",
    prompt="A futuristic city with flying cars at sunset, cyberpunk style",
    n=1,
    size="1024x1024"
)

image_url = response.data[0].url
print(f"Image: {image_url}")

⚡

Tip: Use image_size: "2K" or "4K" for higher resolution output. Provide detailed prompts with art style, lighting, and composition for best results. Failed generations are not charged.

Video Generation

Generate short AI videos from text prompts. Video generation is asynchronous — submit a task, then poll for results.

Submit Video Generation

POST /v1/videos

Request Body

Parameter	Type	Description
`model`	string	`"grok-video-3"` Required
`prompt`	string	Text description of the video Required
`duration`	integer	Length in seconds (5-15). Default: 8 Optional
`resolution`	string	`"480p"` or `"720p"`. Default: 480p Optional
`aspect_ratio`	string	`"16:9"`, `"9:16"`, `"1:1"`. Default: 16:9 Optional

cURL - Submit Video

curl -X POST https://apitopmix.com/v1/videos \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-video-3",
    "prompt": "A golden retriever running on a beach at sunset, cinematic",
    "duration": 10,
    "resolution": "720p"
  }'

Submit Response

{
  "id": "task_abc123xyz",
  "object": "video",
  "status": "queued"
}

Poll Video Status

GET /v1/videos/{task_id}

cURL - Poll Status

curl https://apitopmix.com/v1/videos/task_abc123xyz \
  -H "Authorization: Bearer $API_KEY"

Completed Response

{
  "code": "success",
  "data": {
    "status": "SUCCESS",
    "progress": "100%",
    "data": {
      "output": "https://example.com/generated-video.mp4"
    }
  }
}

Python Example

Python

import requests, time

BASE = "https://apitopmix.com/v1"
KEY = "sk-your-api-key"
H = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

# Submit
r = requests.post(f"{BASE}/videos", headers=H, json={
    "model": "grok-video-3",
    "prompt": "A cat watching fish in an aquarium",
    "duration": 10, "resolution": "720p"
})
task_id = r.json()["id"]
print(f"Submitted: {task_id}")

# Poll
while True:
    s = requests.get(f"{BASE}/videos/{task_id}", headers=H).json()
    d = s.get("data", s)
    print(f"Status: {d.get('status')} | {d.get('progress')}")
    if d.get("status") == "SUCCESS":
        print(f"Video: {d['data']['output']}")
        break
    elif d.get("status") == "FAILURE":
        print(f"Failed: {d.get('fail_reason')}")
        break
    time.sleep(10)

Pricing

Model	Price	Duration	Resolution
`grok-video-3`	$0.030/sec (40% off official)	1-15 seconds	480p / 720p

Duration	Cost
5 seconds	$0.150
8 seconds	$0.240
10 seconds	$0.300
15 seconds	$0.450

ℹ

Async Processing — Video generation takes 30-60 seconds. Poll every 10 seconds. Download URLs are temporary — save the video promptly.

Error Codes

All errors return a consistent JSON structure with an error object.

Error Response Format

Error Response

{
  "error": {
    "message": "Invalid API key provided.",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

Common Error Codes

HTTP Status	Error Type	Description
400	`invalid_request_error`	The request body is malformed or missing required parameters.
401	`authentication_error`	Invalid or missing API key. Check your `Authorization` header.
403	`permission_error`	Your API key does not have permission to access this resource.
404	`not_found_error`	The requested resource or endpoint does not exist.
429	`rate_limit_error`	Too many requests. Slow down and retry with exponential backoff.
500	`server_error`	An unexpected error occurred on our servers. Try again later.
503	`service_unavailable`	The upstream model provider is temporarily unavailable.

⚠

Retry Strategy: For 429 and 5xx errors, implement exponential backoff starting with a 1-second delay, doubling on each retry up to a maximum of 60 seconds.

Rate Limits

ApiTopMix enforces rate limits to ensure fair usage and platform stability. Limits vary by plan and are applied per API key.

Rate Limit Headers

Every response includes headers indicating your current rate limit status:

Header	Description
`x-ratelimit-limit-requests`	Maximum requests allowed in the current window
`x-ratelimit-remaining-requests`	Remaining requests in the current window
`x-ratelimit-reset-requests`	Time (in seconds) until the request limit resets
`x-ratelimit-limit-tokens`	Maximum tokens allowed per minute
`x-ratelimit-remaining-tokens`	Remaining tokens in the current window

Default Limits

Tier	RPM (Requests/Min)	TPM (Tokens/Min)
Free	10	40,000
Standard	60	200,000
Pro	300	1,000,000
Enterprise	Custom	Custom

ℹ

Need higher limits? Contact us at [email protected] for enterprise-grade rate limits tailored to your use case.

Best Practices

Monitor the x-ratelimit-remaining-requests header to stay within your limits.
Implement exponential backoff when receiving 429 responses.
Cache responses when possible to reduce unnecessary API calls.
Use streaming for long completions to avoid timeout issues.
Batch embedding requests by sending multiple texts in a single call.

Model	Type	Input / 1M	Output / 1M
`gemini-2.5-flash`	Fast & cheap	$0.30	$1.80
`gemini-2.5-pro`	High quality	$1.25	$5.00
`gemini-3-flash-preview`	Latest fast	$0.30	$1.80
`gemini-3-pro-preview`	Latest flagship	$1.25	$7.50
`gemini-3.1-pro-preview`	Latest flagship+	$1.20	$7.20
`gemini-3.1-flash-lite-preview`	Ultra cheap	$0.15	$0.90

Parameter	Type	Status
`contents` (text, image, multi-turn)	array	✔ Supported
`systemInstruction`	object	✔ Supported
`generationConfig.temperature`	float	✔ Supported
`generationConfig.topP`	float	✔ Supported
`generationConfig.topK`	int	✔ Supported
`generationConfig.maxOutputTokens`	int	✔ Supported
`generationConfig.stopSequences`	string[]	✔ Supported
`generationConfig.seed`	int	✔ Supported
`generationConfig.presencePenalty`	float	✔ Supported
`generationConfig.frequencyPenalty`	float	✔ Supported
`generationConfig.thinkingConfig`	object	✔ Supported
`safetySettings`	array	✔ Supported
`tools` (function calling)	array	✔ Supported
`inlineData` (images)	base64	✔ Supported

Vertex AI URL	ApiTopMix URL
`https://{region}-aiplatform.googleapis.com/v1/projects/{p}/locations/{l}/publishers/google/models/{model}:generateContent`	`https://apitopmix.com/v1/projects/{any}/locations/{any}/publishers/google/models/{model}:generateContent`
`.../{model}:streamGenerateContent`	`https://apitopmix.com/v1/projects/{any}/locations/{any}/publishers/google/models/{model}:streamGenerateContent?alt=sse`
`.../endpoints/openapi/chat/completions`	`https://apitopmix.com/v1/projects/{any}/locations/{any}/endpoints/openapi/chat/completions`

API Documentation

Getting Started

Base URL

Quick Start

Authentication

Chat Completions

Request Body

Message Object

Response Format

Code Examples

Streaming (SSE)

Claude Messages API

Authentication

Request Body

Supported Models

Response Format

Code Examples

Streaming

Models

Supported Providers & Models

Claude Series

Gemini Series

DeepSeek

Llama Series

Mistral Series

Kimi Series

Qwen Series

More Models

Pricing

Embeddings

Request Body

Supported Models

Response Format

Music Generation (Suno API)

Submit Music Generation

Request Body

Submit Response

Fetch Generation Result

Image Generation

Request Body

Supported Models

Response

Python Example

Video Generation

Submit Video Generation

Request Body

Submit Response

Poll Video Status

Completed Response

Python Example

Pricing

Error Codes

Error Response Format

Common Error Codes

Rate Limits

Rate Limit Headers

Default Limits

Best Practices

Gemini API (Native Format)

Supported Gemini Models

Option A: Use via OpenAI SDK (Easiest)

Option B: Use Gemini Native Format

Streaming

Supported Parameters

Vertex AI Compatibility

Vertex AI Endpoint Mapping

Migration from Vertex AI