Embeddings

from openai import OpenAI client = OpenAI( base_url="https://api.cometapi.com/v1", api_key="<COMETAPI_KEY>", ) response = client.embeddings.create( model="text-embedding-3-small", input="The food was delicious and the waiter was friendly.", ) print(response.data[0].embedding[:5]) # First 5 dimensions print(f"Dimensions: {len(response.data[0].embedding)}")

{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [ -0.0021, -0.0491, 0.0209, 0.0314, -0.0453 ] } ], "model": "text-embedding-3-small", "usage": { "prompt_tokens": 2, "total_tokens": 2 } }

Tổng quan

Embeddings API tạo ra các biểu diễn vector của văn bản để nắm bắt ý nghĩa ngữ nghĩa. Các vector này có thể được dùng cho tìm kiếm ngữ nghĩa, phân cụm, phân loại, phát hiện bất thường và retrieval-augmented generation (RAG).

CometAPI hỗ trợ các model embedding từ nhiều nhà cung cấp. Truyền vào một hoặc nhiều chuỗi văn bản và nhận về các vector số mà bạn có thể lưu trong cơ sở dữ liệu vector hoặc dùng trực tiếp để tính toán độ tương đồng.

Các model khả dụng

Model	Dimensions	Max Tokens	Phù hợp nhất cho
`text-embedding-3-large`	3,072 (có thể điều chỉnh)	8,191	Embeddings chất lượng cao nhất
`text-embedding-3-small`	1,536 (có thể điều chỉnh)	8,191	Tiết kiệm chi phí, nhanh
`text-embedding-ada-002`	1,536 (cố định)	8,191	Tương thích hệ thống cũ

Xem danh sách model để biết tất cả các model embedding hiện có và giá.

Lưu ý quan trọng

Giảm số chiều — Các model text-embedding-3-* hỗ trợ tham số dimensions, cho phép bạn rút ngắn vector embedding mà không làm giảm đáng kể độ chính xác. Điều này có thể giảm chi phí lưu trữ tới 75% trong khi vẫn giữ lại phần lớn thông tin ngữ nghĩa.

Batch Input — Bạn có thể tạo embedding cho nhiều văn bản trong một request duy nhất bằng cách truyền một mảng chuỗi vào tham số input. Cách này hiệu quả hơn đáng kể so với việc tạo từng request riêng cho mỗi đoạn văn bản.

Ủy quyền

Authorization

string

header

bắt buộc

Bearer token authentication. Use your CometAPI key.

Nội dung

application/json

model

string

bắt buộc

The embedding model to use. See the Models page for current embedding model IDs.

Ví dụ:

"text-embedding-3-small"

input

bắt buộc

The text to embed. Can be a single string, an array of strings, or an array of token arrays. Each input must not exceed the model's maximum token limit (8,191 tokens for text-embedding-3-* models).

encoding_format

enum<string>

mặc định:float

The format of the returned embedding vectors. float returns an array of floating-point numbers. base64 returns a base64-encoded string representation, which can reduce response size for large batches.

Tùy chọn có sẵn:

float,

base64

dimensions

integer

The number of dimensions for the output embedding vector. Only supported by text-embedding-3-* models. Reducing dimensions can lower storage costs while maintaining most of the embedding's utility.

Phạm vi bắt buộc: x >= 1

user

string

A unique identifier for your end-user, which can help monitor and detect abuse.

Phản hồi

200 - application/json

A list of embedding vectors for the input text(s).

object

enum<string>

The object type, always list.

Tùy chọn có sẵn:

list

Ví dụ:

"list"

data

object[]

An array of embedding objects, one per input text. When multiple inputs are provided, results are returned in the same order as the input.

Show child attributes

model

string

The model used to generate the embeddings.

Ví dụ:

"text-embedding-3-small"

usage

object

Token usage statistics for this request.

Show child attributes

Tổng quan

Tài liệu tham khảo API

Hướng dẫn tích hợp

Lỗi

Giá & Thanh toán

Hỗ trợ

Tổng quan

Các model khả dụng

Lưu ý quan trọng

Ủy quyền

Nội dung

Phản hồi

Tổng quan

Tài liệu tham khảo API

Hướng dẫn tích hợp

Lỗi

Giá & Thanh toán

Hỗ trợ

​Tổng quan

​Các model khả dụng

​Lưu ý quan trọng

Ủy quyền

Nội dung

Phản hồi

Tổng quan

Các model khả dụng

Lưu ý quan trọng