Anthropic メッセージ

概要

CometAPI は Anthropic Messages API をネイティブにサポートしており、Anthropic 固有のすべての機能を備えた Claude モデルに直接アクセスできます。extended thinking、prompt caching、effort control など、Claude 専用の機能にはこのエンドポイントを使用してください。

クイックスタート

公式の Anthropic SDK を使用し、base URL を CometAPI に設定するだけです:

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key="<COMETAPI_KEY>",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

認証には x-api-key と Authorization: Bearer の両方のヘッダーを使用できます。公式の Anthropic SDK はデフォルトで x-api-key を使用します。

Extended Thinking

thinking パラメータを使って、Claude の段階的な推論を有効にします。レスポンスには、最終回答の前に Claude の内部推論を示す thinking content ブロックが含まれます。

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,
    },
    messages=[
        {"role": "user", "content": "Prove that there are infinitely many primes."}
    ],
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

thinking には最小 1,024 の budget_tokens が必要です。thinking のトークン（Token）は max_tokens 制限に含まれるため、thinking とレスポンスの両方を収められるように max_tokens を十分大きく設定してください。

Prompt Caching

後続のリクエストでレイテンシとコストを削減するために、大きな system プロンプト（Prompt）や会話の接頭部分をキャッシュできます。キャッシュしたい content ブロックに cache_control を追加します:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert code reviewer. [Long detailed instructions...]",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Review this code..."}],
)

キャッシュの使用状況は、レスポンスの usage フィールドで報告されます:

cache_creation_input_tokens — キャッシュに書き込まれたトークン（Token）（高い料金で課金）
cache_read_input_tokens — キャッシュから読み取られたトークン（Token）（割引料金で課金）

prompt caching では、キャッシュ対象の content ブロックに最低 1,024 トークン（Token） が必要です。これより短い content はキャッシュされません。

ストリーミング（Streaming）

stream: true を設定すると、Server-Sent Events（SSE）を使ってレスポンスをストリーミングできます。イベントは次の順序で到着します。

message_start — メッセージのメタデータと初期の usage を含みます
content_block_start — 各 content block の開始を示します
content_block_delta — 増分のテキストチャンク（text_delta）
content_block_stop — 各 content block の終了を示します
message_delta — 最終的な stop_reason と完全な usage
message_stop — ストリームの終了を示します

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

Effort の制御

output_config.effort を使うと、レスポンス生成に Claude がどの程度の effort をかけるかを制御できます。

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize this briefly."}
    ],
    output_config={"effort": "low"},  # "low", "medium", or "high"
)

サーバーツール

Claude は、Anthropic のインフラ上で実行されるサーバーサイドツールをサポートしています。

Web Fetch
Web Search

URL からコンテンツを取得して分析します。

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Analyze the content at https://arxiv.org/abs/1512.03385"}
    ],
    tools=[
        {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5}
    ],
)

リアルタイム情報を Web 検索します。

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What are the latest developments in AI?"}
    ],
    tools=[
        {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}
    ],
)

レスポンス例

CometAPI の Anthropic エンドポイントから返される典型的なレスポンス:

{
  "id": "msg_bdrk_01UjHdmSztrL7QYYm7CKBDFB",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 19,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 4
  }
}

OpenAI 互換エンドポイントとの主な違い

機能	Anthropic Messages (`/v1/messages`)	OpenAI-Compatible (`/v1/chat/completions`)
Extended thinking	`budget_tokens` を伴う `thinking` パラメータ	利用不可
Prompt caching	content block 上の `cache_control`	利用不可
Effort control	`output_config.effort`	利用不可
Web fetch/search	サーバーツール（`web_fetch`, `web_search`）	利用不可
Auth header	`x-api-key` または `Bearer`	`Bearer` のみ
Response format	Anthropic 形式（`content` blocks）	OpenAI 形式（`choices`, `message`）
Models	Claude のみ	マルチプロバイダー（GPT, Claude, Gemini など）

承認

x-api-key

string

header

必須

Your CometAPI key passed via the x-api-key header. Authorization: Bearer <key> is also supported.

ヘッダー

anthropic-version

string

デフォルト:2023-06-01

The Anthropic API version to use. Defaults to 2023-06-01.

例:

"2023-06-01"

anthropic-beta

string

Comma-separated list of beta features to enable. Examples: max-tokens-3-5-sonnet-2024-07-15, pdfs-2024-09-25, output-128k-2025-02-19.

ボディ

application/json

model

string

必須

The Claude model to use. See the Models page for current Claude model IDs.

例:

"claude-sonnet-4-6"

messages

object[]

必須

The conversation messages. Must alternate between user and assistant roles. Each message's content can be a string or an array of content blocks (text, image, document, tool_use, tool_result). There is a limit of 100,000 messages per request.

Show child attributes

max_tokens

integer

必須

The maximum number of tokens to generate. The model may stop before reaching this limit. When using thinking, the thinking tokens count towards this limit.

必須範囲: x >= 1

例:

1024

system

System prompt providing context and instructions to Claude. Can be a plain string or an array of content blocks (useful for prompt caching).

temperature

number

デフォルト:1

Controls randomness in the response. Range: 0.0–1.0. Use lower values for analytical tasks and higher values for creative tasks. Defaults to 1.0.

必須範囲: 0 <= x <= 1

top_p

number

Nucleus sampling threshold. Only tokens with cumulative probability up to this value are considered. Range: 0.0–1.0. Use either temperature or top_p, not both.

必須範囲: 0 <= x <= 1

top_k

integer

Only sample from the top K most probable tokens. Recommended for advanced use cases only.

必須範囲: x >= 0

stream

boolean

デフォルト:false

If true, stream the response incrementally using Server-Sent Events (SSE). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.

stop_sequences

string[]

Custom strings that cause the model to stop generating when encountered. The stop sequence is not included in the response.

thinking

object

Enable extended thinking — Claude's step-by-step reasoning process. When enabled, the response includes thinking content blocks before the answer. Requires a minimum budget_tokens of 1,024.

Show child attributes

tools

object[]

Tools the model may use. Supports client-defined functions, web search (web_search_20250305), web fetch (web_fetch_20250910), code execution (code_execution_20250522), and more.

Show child attributes

tool_choice

object

Controls how the model uses tools.

Show child attributes

metadata

object

Request metadata for tracking and analytics.

Show child attributes

output_config

object

Configuration for output behavior.

Show child attributes

service_tier

enum<string>

The service tier to use. auto tries priority capacity first, standard_only uses only standard capacity.

利用可能なオプション:

auto,

standard_only

レスポンス

200 - application/json

Successful response. When stream is true, the response is a stream of SSE events.

string

Unique identifier for this message (e.g., msg_01XFDUDYJgAACzvnptvVoYEL).

type

enum<string>

Always message.

利用可能なオプション:

message

role

enum<string>

Always assistant.

利用可能なオプション:

assistant

content

object[]

The response content blocks. May include text, thinking, tool_use, and other block types.

Show child attributes

model

string

The specific model version that generated this response (e.g., claude-sonnet-4-6).

stop_reason

enum<string>

Why the model stopped generating.

利用可能なオプション:

end_turn,

max_tokens,

stop_sequence,

tool_use,

pause_turn

stop_sequence

string | null

The stop sequence that caused the model to stop, if applicable.

usage

object

Token usage statistics.

Show child attributes

概要

APIリファレンス

統合ガイド

エラー

料金・請求

サポート

概要

クイックスタート

Extended Thinking

Prompt Caching

ストリーミング（Streaming）

Effort の制御

サーバーツール

レスポンス例

OpenAI 互換エンドポイントとの主な違い

承認

ヘッダー

ボディ

レスポンス

概要

APIリファレンス

統合ガイド

エラー

料金・請求

サポート

​概要

​クイックスタート

​Extended Thinking

​Prompt Caching

​ストリーミング（Streaming）

​Effort の制御

​サーバーツール

​レスポンス例

​OpenAI 互換エンドポイントとの主な違い

承認

ヘッダー

ボディ

レスポンス

概要

クイックスタート

Extended Thinking

Prompt Caching

ストリーミング（Streaming）

Effort の制御

サーバーツール

レスポンス例

OpenAI 互換エンドポイントとの主な違い