聊天补全

CometAPI 通过单一的 OpenAI 兼容接口，将聊天补全路由到多个提供商——包括 OpenAI、Claude 和 Gemini。通过更改 model 参数即可在模型之间切换；大多数 OpenAI 兼容 SDK 只需将 base_url 设置为 https://api.cometapi.com/v1 即可工作。

不同模型支持的参数子集不同，返回的响应字段也会略有差异。例如，reasoning_effort 仅适用于推理模型（o-series、GPT-5.1+），并且某些模型不支持 logprobs 或 n > 1。

对于 OpenAI Pro 模型、o-series 推理模型以及 Codex 模型，请改用响应端点。这些模型系列在 Responses API 上具有更完整的支持。

消息角色

Role	Description
`system`	设置助手的行为和个性。放置在对话开头。
`developer`	对于较新的模型（o1+），用于替代 `system`。无论用户输入什么，都提供模型应遵循的指令。
`user`	来自终端用户的消息。
`assistant`	先前的模型响应，用于维护对话历史。
`tool`	工具/函数调用的结果。必须包含与原始工具调用匹配的 `tool_call_id`。

对于较新的模型（GPT-4.1、GPT-5 系列、o-series），在指令消息中优先使用 developer 而不是 system。两者都可用，但 developer 能提供更强的指令遵循行为。

发送多模态（Multimodal）输入

许多模型支持图像和音频与文本一起输入。要发送多模态消息，请对 content 使用数组格式：

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png",
        "detail": "high"
      }
    }
  ]
}

detail 参数控制图像分析深度：

low — 更快，使用更少的 tokens（固定成本）
high — 详细分析，消耗更多 tokens
auto — 由模型决定（默认）

流式输出（Streaming）响应

要接收增量输出，请将 stream 设置为 true。响应会以 Server-Sent Events (SSE) 的形式传递，其中每个事件都包含一个 chat.completion.chunk 对象：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

要在流式响应中包含 token 使用统计信息，请将 stream_options.include_usage 设置为 true。使用数据会出现在 [DONE] 之前的最后一个 chunk 中。

请求结构化输出

要强制模型返回符合特定 schema 的有效 JSON，请使用 response_format：

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "result",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"},
          "confidence": {"type": "number"}
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  }
}

JSON Schema 模式（json_schema）可保证输出与您的 schema 完全匹配。JSON Object 模式（json_object）仅保证返回有效 JSON——不强制约束其结构。

调用工具和函数

要让模型能够调用外部函数，请提供工具定义：

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

当模型决定调用工具时，响应中会包含 finish_reason: "tool_calls"，并且 message.tool_calls 数组会包含函数名称和参数。随后，您需要执行该函数，并将结果作为带有匹配 tool_call_id 的 tool 消息发送回去。

跨提供商说明

各提供商的参数支持

参数	OpenAI GPT	Claude（通过 compat）	Gemini（通过 compat）
`temperature`	0–2	0–1	0–2
`top_p`	0–1	0–1	0–1
`n`	1–128	仅 1	1–8
`stop`	最多 4 个	最多 4 个	最多 5 个
`tools`	✅	✅	✅
`response_format`	✅	✅ (`json_schema`)	✅
`logprobs`	✅	❌	❌
`reasoning_effort`	o-series、GPT-5.1+	❌	❌（Gemini 原生请使用 `thinking`）

max_tokens 与 max_completion_tokens

max_tokens — 旧版参数。适用于大多数模型，但对于较新的 OpenAI 模型已被弃用。
max_completion_tokens — GPT-4.1、GPT-5 系列和 o-series 模型的推荐参数。对于推理模型是必需的，因为它同时包含输出 tokens 和推理 tokens。

当路由到不同提供商时，CometAPI 会自动处理映射。

system 与 developer role

system — 传统的指令 role。适用于所有模型。
developer — 随 o1 模型引入。可为较新的模型提供更强的指令遵循能力。在较旧模型上会回退为 system 行为。

对于面向 GPT-4.1+ 或 o-series 模型的新项目，请使用 developer。

常见问题

如何处理速率限制？

当遇到 429 Too Many Requests 时，请实现指数退避：

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.4",
                messages=messages,
            )
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

如何维护对话上下文？

在 messages 数组中包含完整的对话历史：

messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "What are its main advantages?"},
]

`finish_reason` 是什么意思？

Value	Meaning
`stop`	自然完成或命中停止序列。
`length`	达到 `max_tokens` 或 `max_completion_tokens` 限制。
`tool_calls`	模型调用了一个或多个工具/函数调用。
`content_filter`	由于内容策略，输出被过滤。

如何控制成本？

使用 max_completion_tokens 限制输出长度。
选择性价比高的模型（例如，对于更简单的任务可使用 gpt-5.4-mini 或 gpt-5.4-nano）。
保持 Prompt 简洁——避免冗余上下文。
监控 usage 响应字段中的 Token 使用情况。

授权

Authorization

string

header

必填

Bearer token authentication. Use your CometAPI key.

请求体

application/json

model

string

默认值:gpt-5.4

必填

Model ID to use for this request. See the Models page for current options.

示例:

"gpt-4.1"

messages

object[]

必填

A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).

Show child attributes

stream

boolean

If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.

temperature

number

默认值:1

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.

必填范围: 0 <= x <= 2

top_p

number

默认值:1

Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.

必填范围: 0 <= x <= 1

integer

默认值:1

Number of completion choices to generate for each input message. Defaults to 1.

stop

string

Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.

max_tokens

integer

Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.

presence_penalty

number

默认值:0

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.

必填范围: -2 <= x <= 2

frequency_penalty

number

默认值:0

Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.

必填范围: -2 <= x <= 2

logit_bias

object

A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.

user

string

A unique identifier for your end-user. Helps with abuse detection and monitoring.

max_completion_tokens

integer

An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.

response_format

object

Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.

Show child attributes

tools

object[]

A list of tools the model may call. Currently supports function type tools.

Show child attributes

tool_choice

默认值:auto

Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.

logprobs

boolean

默认值:false

Whether to return log probabilities of the output tokens.

top_logprobs

integer

Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.

必填范围: 0 <= x <= 20

reasoning_effort

enum<string>

Controls the reasoning effort for o-series and GPT-5.1+ models.

可用选项:

low,

medium,

high

stream_options

object

Options for streaming. Only valid when stream is true.

Show child attributes

service_tier

enum<string>

Specifies the processing tier.

可用选项:

auto,

default,

flex,

priority

响应

200 - application/json

Successful chat completion response.

string

Unique completion identifier.

示例:

"chatcmpl-abc123"

object

enum<string>

可用选项:

chat.completion

示例:

"chat.completion"

created

integer

Unix timestamp of creation.

示例:

1774412483

model

string

The model used (may include version suffix).

示例:

"gpt-5.4-2025-07-16"

choices

object[]

Array of completion choices.

Show child attributes

usage

object

Show child attributes

service_tier

string

示例:

"default"

system_fingerprint

string | null

示例:

"fp_490a4ad033"

概览

API 参考

集成指南

错误处理

定价计费

帮助中心

消息角色

发送多模态（Multimodal）输入

流式输出（Streaming）响应

请求结构化输出

调用工具和函数

跨提供商说明

常见问题

如何处理速率限制？

如何维护对话上下文？

`finish_reason` 是什么意思？

如何控制成本？

授权

请求体

响应

概览

API 参考

集成指南

错误处理

定价计费

帮助中心

​消息角色

​发送多模态（Multimodal）输入

​流式输出（Streaming）响应

​请求结构化输出

​调用工具和函数

​跨提供商说明

​常见问题

​如何处理速率限制？

​如何维护对话上下文？

​finish_reason 是什么意思？

​如何控制成本？

授权

请求体

响应

消息角色

发送多模态（Multimodal）输入

流式输出（Streaming）响应

请求结构化输出

调用工具和函数

跨提供商说明

常见问题

如何处理速率限制？

如何维护对话上下文？

`finish_reason` 是什么意思？

如何控制成本？