Anthropic 消息

概述

CometAPI 原生支持 Anthropic Messages API，让你可以直接访问具备全部 Anthropic 专属功能的 Claude 模型。对于扩展思考、Prompt 缓存和 effort control 等 Claude 独有能力，请使用此端点。

快速开始

使用官方 Anthropic SDK——只需将 base URL 设置为 CometAPI：

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key="<COMETAPI_KEY>",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

支持使用 x-api-key 和 Authorization: Bearer 请求头进行身份验证。官方 Anthropic SDK 默认使用 x-api-key。

扩展思考

使用 thinking 参数启用 Claude 的分步推理。响应中会包含 thinking 内容块，在最终答案前展示 Claude 的内部推理过程。

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,
    },
    messages=[
        {"role": "user", "content": "Prove that there are infinitely many primes."}
    ],
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Thinking 要求 budget_tokens 的最小值为 1,024。Thinking 所消耗的 Token 会计入你的 max_tokens 限制，因此请将 max_tokens 设置得足够高，以同时容纳 thinking 和响应内容。

Prompt 缓存

缓存较大的 system prompt 或对话前缀，以降低后续请求的延迟和成本。为需要缓存的内容块添加 cache_control：

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert code reviewer. [Long detailed instructions...]",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Review this code..."}],
)

缓存使用情况会在响应的 usage 字段中返回：

cache_creation_input_tokens — 写入缓存的 tokens（按更高费率计费）
cache_read_input_tokens — 从缓存读取的 tokens（按较低费率计费）

Prompt 缓存要求被缓存的内容块至少包含 1,024 tokens。短于此长度的内容将不会被缓存。

流式输出（Streaming）

通过设置 stream: true，使用 Server-Sent Events (SSE) 进行流式响应。事件会按以下顺序到达：

message_start — 包含消息元数据和初始 usage
content_block_start — 标记每个内容块的开始
content_block_delta — 增量文本片段（text_delta）
content_block_stop — 标记每个内容块的结束
message_delta — 最终的 stop_reason 和完整的 usage
message_stop — 表示流结束

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

努力程度控制

使用 output_config.effort 控制 Claude 在生成响应时投入多少计算努力：

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize this briefly."}
    ],
    output_config={"effort": "low"},  # "low", "medium", or "high"
)

服务器工具

Claude 支持运行在 Anthropic 基础设施上的服务端工具：

Web Fetch
Web Search

从 URL 抓取并分析内容：

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Analyze the content at https://arxiv.org/abs/1512.03385"}
    ],
    tools=[
        {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5}
    ],
)

搜索网络以获取实时信息：

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What are the latest developments in AI?"}
    ],
    tools=[
        {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}
    ],
)

响应示例

来自 CometAPI 的 Anthropic 端点的典型响应：

{
  "id": "msg_bdrk_01UjHdmSztrL7QYYm7CKBDFB",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 19,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 4
  }
}

与 OpenAI-Compatible 端点的关键区别

功能	Anthropic Messages (`/v1/messages`)	OpenAI-Compatible (`/v1/chat/completions`)
扩展思考	带有 `budget_tokens` 的 `thinking` 参数	不可用
Prompt 缓存	内容块上的 `cache_control`	不可用
努力程度控制	`output_config.effort`	不可用
Web 抓取/搜索	服务器工具（`web_fetch`, `web_search`）	不可用
认证请求头	`x-api-key` 或 `Bearer`	仅 `Bearer`
响应格式	Anthropic 格式（`content` 块）	OpenAI 格式（`choices`, `message`）
模型	仅 Claude	多提供商（GPT、Claude、Gemini 等）

授权

x-api-key

string

header

必填

Your CometAPI key passed via the x-api-key header. Authorization: Bearer <key> is also supported.

请求头

anthropic-version

string

默认值:2023-06-01

The Anthropic API version to use. Defaults to 2023-06-01.

示例:

"2023-06-01"

anthropic-beta

string

Comma-separated list of beta features to enable. Examples: max-tokens-3-5-sonnet-2024-07-15, pdfs-2024-09-25, output-128k-2025-02-19.

请求体

application/json

model

string

必填

The Claude model to use. See the Models page for current Claude model IDs.

示例:

"claude-sonnet-4-6"

messages

object[]

必填

The conversation messages. Must alternate between user and assistant roles. Each message's content can be a string or an array of content blocks (text, image, document, tool_use, tool_result). There is a limit of 100,000 messages per request.

Show child attributes

max_tokens

integer

必填

The maximum number of tokens to generate. The model may stop before reaching this limit. When using thinking, the thinking tokens count towards this limit.

必填范围: x >= 1

示例:

1024

system

System prompt providing context and instructions to Claude. Can be a plain string or an array of content blocks (useful for prompt caching).

temperature

number

默认值:1

Controls randomness in the response. Range: 0.0–1.0. Use lower values for analytical tasks and higher values for creative tasks. Defaults to 1.0.

必填范围: 0 <= x <= 1

top_p

number

Nucleus sampling threshold. Only tokens with cumulative probability up to this value are considered. Range: 0.0–1.0. Use either temperature or top_p, not both.

必填范围: 0 <= x <= 1

top_k

integer

Only sample from the top K most probable tokens. Recommended for advanced use cases only.

必填范围: x >= 0

stream

boolean

默认值:false

If true, stream the response incrementally using Server-Sent Events (SSE). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.

stop_sequences

string[]

Custom strings that cause the model to stop generating when encountered. The stop sequence is not included in the response.

thinking

object

Enable extended thinking — Claude's step-by-step reasoning process. When enabled, the response includes thinking content blocks before the answer. Requires a minimum budget_tokens of 1,024.

Show child attributes

tools

object[]

Tools the model may use. Supports client-defined functions, web search (web_search_20250305), web fetch (web_fetch_20250910), code execution (code_execution_20250522), and more.

Show child attributes

tool_choice

object

Controls how the model uses tools.

Show child attributes

metadata

object

Request metadata for tracking and analytics.

Show child attributes

output_config

object

Configuration for output behavior.

Show child attributes

service_tier

enum<string>

The service tier to use. auto tries priority capacity first, standard_only uses only standard capacity.

可用选项:

auto,

standard_only

响应

200 - application/json

Successful response. When stream is true, the response is a stream of SSE events.

string

Unique identifier for this message (e.g., msg_01XFDUDYJgAACzvnptvVoYEL).

type

enum<string>

Always message.

可用选项:

message

role

enum<string>

Always assistant.

可用选项:

assistant

content

object[]

The response content blocks. May include text, thinking, tool_use, and other block types.

Show child attributes

model

string

The specific model version that generated this response (e.g., claude-sonnet-4-6).

stop_reason

enum<string>

Why the model stopped generating.

可用选项:

end_turn,

max_tokens,

stop_sequence,

tool_use,

pause_turn

stop_sequence

string | null

The stop sequence that caused the model to stop, if applicable.

usage

object

Token usage statistics.

Show child attributes

概览

API 参考

集成指南

错误处理

定价计费

帮助中心

概述

快速开始

扩展思考

Prompt 缓存

流式输出（Streaming）

努力程度控制

服务器工具

响应示例

与 OpenAI-Compatible 端点的关键区别

授权

请求头

请求体

响应

概览

API 参考

集成指南

错误处理

定价计费

帮助中心

​概述

​快速开始

​扩展思考

​Prompt 缓存

​流式输出（Streaming）

​努力程度控制

​服务器工具

​响应示例

​与 OpenAI-Compatible 端点的关键区别

授权

请求头

请求体

响应

概述

快速开始

扩展思考

Prompt 缓存

流式输出（Streaming）

努力程度控制

服务器工具

响应示例

与 OpenAI-Compatible 端点的关键区别