チャット補完

概要

チャット補完エンドポイントは、大規模言語モデルとやり取りするために最も広く使われている API です。複数のメッセージで構成された会話を受け取り、モデルの応答を返します。 CometAPI はこのエンドポイントを単一の統一インターフェースを通じて OpenAI、Anthropic Claude（互換レイヤー経由）、Google Gemini など複数のプロバイダーにルーティングします。model パラメータを変更するだけでモデルを切り替えられます。

このエンドポイントは OpenAI Chat Completions 形式に従います。ほとんどの OpenAI 互換 SDK やツールは、base_url を https://api.cometapi.com/v1 に変更するだけで CometAPI と連携できます。

重要な注意事項

モデル固有の動作 — モデルごとにサポートするパラメータの一部が異なり、返されるレスポンスフィールドもわずかに異なる場合があります。たとえば、reasoning_effort は推論モデル（o-series、GPT-5.1+）にのみ適用され、一部のモデルは logprobs や n > 1 をサポートしないことがあります。

レスポンスのパススルー — CometAPI はモデルのレスポンスを変更せずにそのまま返します（プロバイダー間でルーティングする際の形式の正規化を除く）。これにより、元の API と一貫した出力を受け取れます。

OpenAI Pro モデル — OpenAI Pro シリーズモデル（例: o1-pro）では、代わりに responses エンドポイントを使用してください。

メッセージロール

Role	Description
`system`	アシスタントの動作と個性を設定します。会話の先頭に配置されます。
`developer`	新しいモデル（o1+）では `system` の代わりに使います。ユーザー入力に関係なく、モデルが従うべき指示を提供します。
`user`	エンドユーザーからのメッセージです。
`assistant`	以前のモデル応答で、会話履歴を維持するために使用されます。
`tool`	ツール/関数呼び出しの結果です。元のツール呼び出しに一致する `tool_call_id` を含める必要があります。

新しいモデル（GPT-4.1、GPT-5 シリーズ、o-series）では、指示メッセージに system よりも developer を推奨します。どちらも動作しますが、developer のほうが指示追従性が強くなります。

マルチモーダル（Multimodal）入力

多くのモデルは、テキストに加えて画像や音声もサポートしています。マルチモーダルメッセージを送信するには、content に配列形式を使用します。

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png",
        "detail": "high"
      }
    }
  ]
}

detail パラメータは画像解析の深さを制御します。

low — 高速で、使用するトークン（Token）が少ない（固定コスト）
high — 詳細な解析を行い、より多くのトークン（Token）を消費
auto — モデルが自動で決定（デフォルト）

ストリーミング（Streaming）

stream を true に設定すると、レスポンスは Server-Sent Events (SSE) として配信されます。各イベントには、増分コンテンツを含む chat.completion.chunk オブジェクトが含まれます。

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

ストリーミングレスポンスにトークン使用統計を含めるには、stream_options.include_usage を true に設定します。usage データは [DONE] の直前の最後のチャンクに表示されます。

構造化出力

response_format を使用して、特定のスキーマに一致する有効な JSON を返すようモデルに強制できます。

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "result",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"},
          "confidence": {"type": "number"}
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  }
}

JSON Schema モード（json_schema）では、出力がスキーマに正確に一致することが保証されます。JSON Object モード（json_object）では、有効な JSON であることのみが保証され、構造は強制されません。

ツール / 関数呼び出し（Function Calling）

ツール定義を提供することで、モデルが外部関数を呼び出せるようにします。

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

モデルがツールを呼び出すと判断すると、レスポンスには finish_reason: "tool_calls" が設定され、message.tool_calls 配列に関数名と引数が含まれます。その後、関数を実行し、対応する tool_call_id を持つ tool メッセージとして結果を返します。

レスポンスフィールド

Field	Description
`id`	一意の completion 識別子（例: `chatcmpl-abc123`）。
`object`	常に `chat.completion`。
`model`	レスポンスを生成したモデル（バージョンサフィックスを含む場合があります）。
`choices`	completion 候補の配列（通常は 1、`n` > 1 の場合を除く）。
`choices[].message`	`role`、`content`、および必要に応じて `tool_calls` を含む assistant のレスポンスメッセージ。
`choices[].finish_reason`	モデルが停止した理由: `stop`、`length`、`tool_calls`、または `content_filter`。
`usage`	トークン消費の内訳: `prompt_tokens`、`completion_tokens`、`total_tokens`、および詳細なサブカウント。
`system_fingerprint`	デバッグ時の再現性のためのバックエンド設定フィンガープリント。

プロバイダー横断メモ

プロバイダー間のパラメータサポート

Parameter	OpenAI GPT	Claude (via compat)	Gemini (via compat)
`temperature`	0–2	0–1	0–2
`top_p`	0–1	0–1	0–1
`n`	1–128	1 のみ	1–8
`stop`	最大 4	最大 4	最大 5
`tools`	✅	✅	✅
`response_format`	✅	✅ (json_schema)	✅
`logprobs`	✅	❌	❌
`reasoning_effort`	o-series, GPT-5.1+	❌	❌ (`Gemini native` では `thinking` を使用)

max_tokens と max_completion_tokens

max_tokens — 従来のパラメータです。ほとんどのモデルで動作しますが、新しい OpenAI モデルでは非推奨です。
max_completion_tokens — GPT-4.1、GPT-5 シリーズ、o-series モデルで推奨されるパラメータです。出力トークンと推論トークンの両方を含むため、推論モデルでは必須です。

CometAPI は、異なるプロバイダーへルーティングする際にマッピングを自動的に処理します。

system と developer role

system — 従来の指示ロールです。すべてのモデルで動作します。
developer — o1 モデルで導入されました。新しいモデルでは、より強力な指示追従を提供します。古いモデルでは system の動作にフォールバックします。

GPT-4.1+ または o-series モデルを対象とする新規プロジェクトでは developer を使用してください。

FAQ

レート制限をどのように処理すればよいですか？

429 Too Many Requests が発生した場合は、指数バックオフを実装してください。

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.4",
                messages=messages,
            )
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

会話コンテキストをどのように維持しますか？

完全な会話履歴を messages 配列に含めてください。

messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "What are its main advantages?"},
]

`finish_reason` は何を意味しますか？

Value	Meaning
`stop`	自然に完了した、または stop シーケンスに到達しました。
`length`	`max_tokens` または `max_completion_tokens` の上限に達しました。
`tool_calls`	モデルが 1 つ以上のツール/関数呼び出しを実行しました。
`content_filter`	コンテンツポリシーにより出力がフィルタリングされました。

コストをどのように抑えますか？

max_completion_tokens を使って出力長を制限します。
コスト効率のよいモデルを選びます（例: より単純なタスクには gpt-5.4-mini または gpt-5.4-nano）。
プロンプト（Prompt）は簡潔に保ち、冗長なコンテキストを避けます。
レスポンスの usage フィールドでトークン使用量を監視します。

承認

Authorization

string

header

必須

Bearer token authentication. Use your CometAPI key.

ボディ

application/json

model

string

デフォルト:gpt-5.4

必須

Model ID to use for this request. See the Models page for current options.

例:

"gpt-4.1"

messages

object[]

必須

A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).

Show child attributes

stream

boolean

If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.

temperature

number

デフォルト:1

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.

必須範囲: 0 <= x <= 2

top_p

number

デフォルト:1

Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.

必須範囲: 0 <= x <= 1

integer

デフォルト:1

Number of completion choices to generate for each input message. Defaults to 1.

stop

string

Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.

max_tokens

integer

Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.

presence_penalty

number

デフォルト:0

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.

必須範囲: -2 <= x <= 2

frequency_penalty

number

デフォルト:0

Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.

必須範囲: -2 <= x <= 2

logit_bias

object

A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.

user

string

A unique identifier for your end-user. Helps with abuse detection and monitoring.

max_completion_tokens

integer

An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.

response_format

object

Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.

Show child attributes

tools

object[]

A list of tools the model may call. Currently supports function type tools.

Show child attributes

tool_choice

デフォルト:auto

Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.

logprobs

boolean

デフォルト:false

Whether to return log probabilities of the output tokens.

top_logprobs

integer

Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.

必須範囲: 0 <= x <= 20

reasoning_effort

enum<string>

Controls the reasoning effort for o-series and GPT-5.1+ models.

利用可能なオプション:

low,

medium,

high

stream_options

object

Options for streaming. Only valid when stream is true.

Show child attributes

service_tier

enum<string>

Specifies the processing tier.

利用可能なオプション:

auto,

default,

flex,

priority

レスポンス

200 - application/json

Successful chat completion response.

string

Unique completion identifier.

例:

"chatcmpl-abc123"

object

enum<string>

利用可能なオプション:

chat.completion

例:

"chat.completion"

created

integer

Unix timestamp of creation.

例:

1774412483

model

string

The model used (may include version suffix).

例:

"gpt-5.4-2025-07-16"

choices

object[]

Array of completion choices.

Show child attributes

usage

object

Show child attributes

service_tier

string

例:

"default"

system_fingerprint

string | null

例:

"fp_490a4ad033"

概要

APIリファレンス

統合ガイド

エラー

料金・請求

サポート

概要

重要な注意事項

メッセージロール

マルチモーダル（Multimodal）入力

ストリーミング（Streaming）

構造化出力

ツール / 関数呼び出し（Function Calling）

レスポンスフィールド

プロバイダー横断メモ

FAQ

レート制限をどのように処理すればよいですか？

会話コンテキストをどのように維持しますか？

`finish_reason` は何を意味しますか？

コストをどのように抑えますか？

承認

ボディ

レスポンス

概要

APIリファレンス

統合ガイド

エラー

料金・請求

サポート

​概要

​重要な注意事項

​メッセージロール

​マルチモーダル（Multimodal）入力

​ストリーミング（Streaming）

​構造化出力

​ツール / 関数呼び出し（Function Calling）

​レスポンスフィールド

​プロバイダー横断メモ

​FAQ

​レート制限をどのように処理すればよいですか？

​会話コンテキストをどのように維持しますか？

​finish_reason は何を意味しますか？

​コストをどのように抑えますか？

承認

ボディ

レスポンス

概要

重要な注意事項

メッセージロール

マルチモーダル（Multimodal）入力

ストリーミング（Streaming）

構造化出力

ツール / 関数呼び出し（Function Calling）

レスポンスフィールド

プロバイダー横断メモ

FAQ

レート制限をどのように処理すればよいですか？

会話コンテキストをどのように維持しますか？

`finish_reason` は何を意味しますか？

コストをどのように抑えますか？