调用 Gemini 图像模型指南

本指南演示如何通过 CometAPI 使用 Google Gen AI SDK 调用 Gemini 图像模型。内容涵盖：

文生图
图像到图像编辑
多图合成
保存生成的图像

基础 URL： https://api.cometapi.com
安装 SDK：pip install google-genai（Python）或 npm install @google/genai（Node.js）

设置

使用 CometAPI 的基础 URL 初始化客户端：

from google import genai
from google.genai import types
import os

COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=COMETAPI_KEY,
)

文本到图像生成

根据文本 Prompt 生成图像并将其保存到文件中。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("generated_image.png")
        print("Image saved to generated_image.png")

响应结构： 图像数据位于 candidates[0].content.parts 中，其中可以包含文本部分和/或图像部分：

{
  "candidates": [{
    "content": {
      "parts": [
        { "text": "Here is your image..." },
        {
          "inlineData": {
            "mimeType": "image/png",
            "data": "<base64-encoded-image>"
          }
        }
      ]
    }
  }]
}

图生图生成

上传一张输入图片，并通过文本 Prompt 对其进行转换。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

# Load the source image
source_image = Image.open("source.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Transform this into a watercolor painting", source_image],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("watercolor_output.png")

Python SDK 可直接接受 PIL.Image 对象——无需手动进行 Base64 编码。
传递原始 Base64 字符串时，不要包含 data:image/jpeg;base64, 前缀。

多图像合成

从多个输入图像生成一张新图像。CometAPI 支持两种方式：

方法 1：单张拼贴图

将多张源图像合并为一张拼贴图，然后描述期望的输出效果。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

collage = Image.open("collage.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=[
        "A model is posing and leaning against a pink BMW with a green alien keychain attached to a pink handbag, a pink parrot on her shoulder, and a pug wearing a pink collar and gold headphones",
        collage,
    ],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("composition_output.png")

方法 2：多张独立图像（最多 14 张）

直接传入多张图像。Gemini 3 模型最多支持 14 张参考图像（对象 + 角色）：

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

image1 = Image.open("image1.jpg")
image2 = Image.open("image2.jpg")
image3 = Image.open("image3.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Merge the three images", image1, image2, image3],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("merged_output.png")

4K 图像生成

指定带有 aspect_ratio 和 image_size 的 image_config 以获得高分辨率输出：

from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Da Vinci style anatomical sketch of a Monarch butterfly on textured parchment",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",
            image_size="4K",
        ),
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("butterfly_4k.png")

多轮图像编辑（Chat）

使用 SDK 的 chat 功能对图像进行迭代优化：

from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

chat = client.chats.create(
    model="gemini-3.1-flash-image-preview",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

# First turn: generate
response = chat.send_message(
    "Create a vibrant infographic explaining photosynthesis as a recipe, styled like a colorful kids cookbook"
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis.png")

# Second turn: refine
response = chat.send_message("Update this infographic to be in Spanish. Do not change any other elements.")

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis_spanish.png")

提示

Prompt 优化

指定风格关键词（例如：“cyberpunk、film grain、low contrast”）、宽高比、主体、背景、光照和细节级别。

Base64 格式

使用原始 HTTP 时，不要包含 data:image/png;base64, 前缀——只使用原始 Base64 字符串。Python SDK 会通过 PIL.Image 对象自动处理这一点。

强制输出图像

将 "responseModalities" 仅设置为 ["IMAGE"]，即可保证只输出图像而不包含文本。

更多详情，请参阅 API 参考。 官方文档： Gemini 图像生成

Gemini 图像理解

概览

API 参考

集成指南

错误处理

定价计费

帮助中心

设置

文本到图像生成

图生图生成

多图像合成

方法 1：单张拼贴图

方法 2：多张独立图像（最多 14 张）

4K 图像生成

多轮图像编辑（Chat）

提示

概览

API 参考

集成指南

错误处理

定价计费

帮助中心

​设置

​文本到图像生成

​图生图生成

​多图像合成

​方法 1：单张拼贴图

​方法 2：多张独立图像（最多 14 张）

​4K 图像生成

​多轮图像编辑（Chat）

​提示

设置

文本到图像生成

图生图生成

多图像合成

方法 1：单张拼贴图

方法 2：多张独立图像（最多 14 张）

4K 图像生成

多轮图像编辑（Chat）

提示