Anthropic 메시지

개요

CometAPI는 Anthropic Messages API를 네이티브로 지원하여 Anthropic 전용 기능이 포함된 Claude 모델에 직접 접근할 수 있게 해줍니다. 확장된 사고, 프롬프트 캐싱, effort control 같은 Claude 전용 기능이 필요할 때 이 엔드포인트를 사용하세요.

빠른 시작

공식 Anthropic SDK를 사용하세요 — base URL만 CometAPI로 설정하면 됩니다:

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key="<COMETAPI_KEY>",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

인증에는 x-api-key와 Authorization: Bearer 헤더를 모두 지원합니다. 공식 Anthropic SDK는 기본적으로 x-api-key를 사용합니다.

확장된 사고

thinking 파라미터로 Claude의 단계별 추론을 활성화하세요. 응답에는 최종 답변 전에 Claude의 내부 추론을 보여주는 thinking content 블록이 포함됩니다.

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,
    },
    messages=[
        {"role": "user", "content": "Prove that there are infinitely many primes."}
    ],
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Thinking은 최소 budget_tokens 1,024가 필요합니다. Thinking 토큰은 max_tokens 한도에 포함되므로, thinking과 응답을 모두 수용할 수 있도록 max_tokens를 충분히 크게 설정하세요.

프롬프트 캐싱

이후 요청에서 지연 시간과 비용을 줄이기 위해 큰 system 프롬프트나 대화 접두사를 캐시할 수 있습니다. 캐시하려는 content 블록에 cache_control을 추가하세요:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert code reviewer. [Long detailed instructions...]",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Review this code..."}],
)

캐시 사용량은 응답의 usage 필드에 보고됩니다:

cache_creation_input_tokens — 캐시에 기록된 토큰(Token) (더 높은 요율로 과금됨)
cache_read_input_tokens — 캐시에서 읽은 토큰(Token) (할인된 요율로 과금됨)

프롬프트 캐싱을 사용하려면 캐시되는 content 블록에 최소 1,024 tokens가 있어야 합니다. 이보다 짧은 content는 캐시되지 않습니다.

스트리밍(Streaming)

stream: true를 설정하면 Server-Sent Events(SSE)를 사용해 응답을 스트리밍할 수 있습니다. 이벤트는 다음 순서로 도착합니다:

message_start — 메시지 메타데이터와 초기 usage를 포함합니다
content_block_start — 각 content 블록의 시작을 나타냅니다
content_block_delta — 점진적으로 전달되는 텍스트 청크(text_delta)
content_block_stop — 각 content 블록의 끝을 나타냅니다
message_delta — 최종 stop_reason과 전체 usage
message_stop — 스트림의 종료를 알립니다

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

Effort 제어

output_config.effort로 Claude가 응답 생성에 얼마나 많은 effort를 들일지 제어할 수 있습니다:

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize this briefly."}
    ],
    output_config={"effort": "low"},  # "low", "medium", or "high"
)

서버 도구

Claude는 Anthropic 인프라에서 실행되는 서버 측 도구를 지원합니다:

Web Fetch
Web Search

URL에서 콘텐츠를 가져와 분석합니다:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Analyze the content at https://arxiv.org/abs/1512.03385"}
    ],
    tools=[
        {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5}
    ],
)

실시간 정보를 위해 웹을 검색합니다:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What are the latest developments in AI?"}
    ],
    tools=[
        {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}
    ],
)

응답 예시

CometAPI의 Anthropic 엔드포인트에서 반환되는 일반적인 응답:

{
  "id": "msg_bdrk_01UjHdmSztrL7QYYm7CKBDFB",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 19,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 4
  }
}

OpenAI 호환 엔드포인트와의 주요 차이점

기능	Anthropic Messages (`/v1/messages`)	OpenAI-Compatible (`/v1/chat/completions`)
확장 사고	`thinking` 파라미터와 `budget_tokens`	지원되지 않음
프롬프트(Prompt) 캐싱	content 블록의 `cache_control`	지원되지 않음
Effort 제어	`output_config.effort`	지원되지 않음
웹 가져오기/검색	서버 도구 (`web_fetch`, `web_search`)	지원되지 않음
인증 헤더	`x-api-key` 또는 `Bearer`	`Bearer`만 지원
응답 형식	Anthropic 형식(`content` 블록)	OpenAI 형식(`choices`, `message`)
모델	Claude 전용	멀티 프로바이더(GPT, Claude, Gemini 등)

인증

x-api-key

string

header

필수

Your CometAPI key passed via the x-api-key header. Authorization: Bearer <key> is also supported.

헤더

anthropic-version

string

기본값:2023-06-01

The Anthropic API version to use. Defaults to 2023-06-01.

예시:

"2023-06-01"

anthropic-beta

string

Comma-separated list of beta features to enable. Examples: max-tokens-3-5-sonnet-2024-07-15, pdfs-2024-09-25, output-128k-2025-02-19.

본문

application/json

model

string

필수

The Claude model to use. See the Models page for current Claude model IDs.

예시:

"claude-sonnet-4-6"

messages

object[]

필수

The conversation messages. Must alternate between user and assistant roles. Each message's content can be a string or an array of content blocks (text, image, document, tool_use, tool_result). There is a limit of 100,000 messages per request.

Show child attributes

max_tokens

integer

필수

The maximum number of tokens to generate. The model may stop before reaching this limit. When using thinking, the thinking tokens count towards this limit.

필수 범위: x >= 1

예시:

1024

system

System prompt providing context and instructions to Claude. Can be a plain string or an array of content blocks (useful for prompt caching).

temperature

number

기본값:1

Controls randomness in the response. Range: 0.0–1.0. Use lower values for analytical tasks and higher values for creative tasks. Defaults to 1.0.

필수 범위: 0 <= x <= 1

top_p

number

Nucleus sampling threshold. Only tokens with cumulative probability up to this value are considered. Range: 0.0–1.0. Use either temperature or top_p, not both.

필수 범위: 0 <= x <= 1

top_k

integer

Only sample from the top K most probable tokens. Recommended for advanced use cases only.

필수 범위: x >= 0

stream

boolean

기본값:false

If true, stream the response incrementally using Server-Sent Events (SSE). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.

stop_sequences

string[]

Custom strings that cause the model to stop generating when encountered. The stop sequence is not included in the response.

thinking

object

Enable extended thinking — Claude's step-by-step reasoning process. When enabled, the response includes thinking content blocks before the answer. Requires a minimum budget_tokens of 1,024.

Show child attributes

tools

object[]

Tools the model may use. Supports client-defined functions, web search (web_search_20250305), web fetch (web_fetch_20250910), code execution (code_execution_20250522), and more.

Show child attributes

tool_choice

object

Controls how the model uses tools.

Show child attributes

metadata

object

Request metadata for tracking and analytics.

Show child attributes

output_config

object

Configuration for output behavior.

Show child attributes

service_tier

enum<string>

The service tier to use. auto tries priority capacity first, standard_only uses only standard capacity.

사용 가능한 옵션:

auto,

standard_only

응답

200 - application/json

Successful response. When stream is true, the response is a stream of SSE events.

string

Unique identifier for this message (e.g., msg_01XFDUDYJgAACzvnptvVoYEL).

type

enum<string>

Always message.

사용 가능한 옵션:

message

role

enum<string>

Always assistant.

사용 가능한 옵션:

assistant

content

object[]

The response content blocks. May include text, thinking, tool_use, and other block types.

Show child attributes

model

string

The specific model version that generated this response (e.g., claude-sonnet-4-6).

stop_reason

enum<string>

Why the model stopped generating.

사용 가능한 옵션:

end_turn,

max_tokens,

stop_sequence,

tool_use,

pause_turn

stop_sequence

string | null

The stop sequence that caused the model to stop, if applicable.

usage

object

Token usage statistics.

Show child attributes

개요

API 레퍼런스

통합 가이드

오류

요금 및 결제

지원

개요

빠른 시작

확장된 사고

프롬프트 캐싱

스트리밍(Streaming)

Effort 제어

서버 도구

응답 예시

OpenAI 호환 엔드포인트와의 주요 차이점

인증

헤더

본문

응답

개요

API 레퍼런스

통합 가이드

오류

요금 및 결제

지원

​개요

​빠른 시작

​확장된 사고

​프롬프트 캐싱

​스트리밍(Streaming)

​Effort 제어

​서버 도구

​응답 예시

​OpenAI 호환 엔드포인트와의 주요 차이점

인증

헤더

본문

응답

개요

빠른 시작

확장된 사고

프롬프트 캐싱

스트리밍(Streaming)

Effort 제어

서버 도구

응답 예시

OpenAI 호환 엔드포인트와의 주요 차이점