跳转至

Gemini 图片生成 (NanoBanana)

Gemini 可以通过对话方式生成和处理图片。您可以使用文本、图片或两者结合来向模型发出提示,从而创建、修改和迭代视觉内容。

可用模型:

  • gemini-3.1-flash-image-preview (Nano Banana 2) - 推荐首选,性能/智能/成本/延迟的最佳平衡,支持图片搜索接地
  • gemini-3-pro-image-preview (Nano Banana Pro) - 专业素材制作,支持高达 4K 分辨率,高级推理能力
  • gemini-2.5-flash-image (Nano Banana) - 快速高效,适合大批量、低延迟任务 官方文档:Gemini Image Generation

异步接口

如果你需要批量生成图片、不想长时间保持连接、或希望直接获取图片 URL 而非 base64 数据,可以使用 异步图片生成接口。提交任务后立即返回,通过轮询获取生成结果。


文本生成图片

根据文本描述生成图片。

POST /v1beta/models/{model}:generateContent

Body

参数 类型 必填 说明
contents array 包含文本提示的内容数组
generationConfig object 生成配置

generationConfig.imageConfig 子属性

参数 类型 说明
aspectRatio string 宽高比。可选:1:11:41:82:33:23:44:14:34:55:48:19:1616:921:9
imageSize string 图片尺寸(仅 Gemini 3 系列)。可选:512px(仅 3.1 Flash)、1K2K4K

imageSize 大小写

imageSize 必须使用大写 K(如 1K2K4K)。小写参数(如 1k)会被拒绝。

generationConfig.responseModalities 说明

说明
["IMAGE"] 仅返回图片
["TEXT", "IMAGE"] 返回文本和图片(默认)

请求示例

curl -s -X POST \
  "https://cdn.12ai.org/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Create a picture of a cute cat playing in the sunshine"}
      ]
    }],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9"
      }
    }
  }' \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > output.png
from google import genai
from google.genai import types

# 直接在 Client 中设置自定义 base URL
client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["draw a pig"],
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="2K",
        ),
    ),
)

# 保存图片
for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        image = part.as_image()
        image.save("output.png")
        print("图片已保存为 output.png")
        break
    elif part.text:
        print("文本响应:", part.text)

响应示例

响应中的图片以 base64 编码的 inlineData 形式返回:

{
  "candidates": [{
    "content": {
      "parts": [{
        "inlineData": {
          "mimeType": "image/png",
          "data": "<BASE64_IMAGE_DATA>"
        }
      }],
      "role": "model"
    },
    "finishReason": "STOP"
  }],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 1290,
    "totalTokenCount": 1300
  }
}

图片编辑

提供图片和文本提示来修改图片。

限制

仅支持通过 inline_data 以 base64 方式上传图片。

请求示例

# 将图片转为 base64
IMG_BASE64=$(base64 -w0 input.jpg)

curl -X POST \
  "https://cdn.12ai.org/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY" \
  -H 'Content-Type: application/json' \
  -d "{
    \"contents\": [{
      \"parts\": [
        {\"text\": \"Add a wizard hat to the cat in this image\"},
        {
          \"inline_data\": {
            \"mime_type\": \"image/jpeg\",
            \"data\": \"$IMG_BASE64\"
          }
        }
      ]
    }]
  }" \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > edited.png
from google import genai
from google.genai import types
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

image = Image.open("input.jpg")

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        "Add a wizard hat to the cat in this image",
        image,
    ],
)

for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        part.as_image().save("edited.png")
        print("编辑后的图片已保存为 edited.png")
        break

更多编辑场景

图片编辑支持多种场景,以下是常见用法:

局部重绘(语义遮盖)

通过文本描述定义需要修改的区域,保持其余部分不变:

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

living_room = Image.open("living_room.png")

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        living_room,
        "Change only the blue sofa to be a vintage brown leather chesterfield sofa. "
        "Keep the rest of the room unchanged.",
    ],
)

for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        part.as_image().save("living_room_edited.png")
        break

风格迁移

将图片以不同的艺术风格重新创作:

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

city_image = Image.open("city.png")

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        city_image,
        "Transform this photograph into the artistic style of Vincent van Gogh's "
        "'Starry Night'. Preserve the original composition but render all elements "
        "with swirling, impasto brushstrokes and a palette of deep blues and bright yellows.",
    ],
)

for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        part.as_image().save("city_style_transfer.png")
        break

高级合成:组合多张图片

提供多张图片作为上下文,创建新的合成场景:

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

dress_image = Image.open("dress.png")
model_image = Image.open("model.png")

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        dress_image,
        model_image,
        "Create a professional e-commerce fashion photo. Take the blue floral dress "
        "from the first image and let the woman from the second image wear it. "
        "Generate a realistic full-body shot with natural lighting and shadows.",
    ],
)

for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        part.as_image().save("fashion_photo.png")
        break

多轮图片对话

通过多轮对话迭代优化图片。建议使用聊天或多轮对话的方式来迭代图片。

请求示例

# 第一轮:生成信息图
curl -s -X POST \
  "https://cdn.12ai.org/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        {"text": "Create an infographic about photosynthesis for a 4th grader"}
      ]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }' > turn1.json

# 第二轮:将文字改为西班牙语
# 需要将第一轮的响应加入对话历史
from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

chat = client.chats.create(
    model="gemini-3-pro-image-preview",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

# 第一轮:生成信息图
response = chat.send_message(
    "Create a vibrant infographic that explains photosynthesis "
    "as if it were a recipe for a plant's favorite food."
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis.png")

# 第二轮:修改语言为西班牙语
response = chat.send_message(
    "Update this infographic to be in Spanish. Do not change any other elements.",
    config=types.GenerateContentConfig(
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="2K",
        ),
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis_spanish.png")

Gemini 3 Pro Image 高级功能

Gemini 3 Pro Image (gemini-3-pro-image-preview) 针对专业素材制作进行了优化,具备以下高级能力:

  • 高分辨率输出(1K / 2K / 4K)
  • 高级文字渲染(信息图表、菜单、图表、营销素材)
  • 使用 Google 搜索进行接地(基于实时数据生成图片)
  • 思考模式(推理复杂提示,生成临时构思图片后输出最终结果)
  • 最多 14 张参考图片输入

高分辨率输出

curl -s -X POST \
  "https://cdn.12ai.org/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "A detailed butterfly illustration in Da Vinci style"}]}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "1:1",
        "imageSize": "4K"
      }
    }
  }'
from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Da Vinci style anatomical sketch of a dissected Monarch butterfly. "
             "Detailed drawings on textured parchment with notes in English.",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",
            image_size="4K",
        ),
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("butterfly_4k.png")

使用 Google 搜索进行接地

模型可以使用 Google 搜索来验证事实,并根据实时数据(如天气、股票、近期活动)生成图片。

注意

将搜索与图片生成搭配使用时,基于图片的搜索结果不会传递给生成模型。响应中包含 groundingMetadata,其中有 searchEntryPointgroundingChunks 字段。

curl -s -X POST \
  "https://cdn.12ai.org/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Visualize the current weather forecast for San Francisco"}]}],
    "tools": [{"google_search": {}}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {"aspectRatio": "16:9"}
    }
  }'
from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Visualize the current weather forecast for the next 5 days "
             "in San Francisco as a clean, modern weather chart.",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
        ),
        tools=[{"google_search": {}}],
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("weather.png")

多张参考图片

最多可混合使用 14 张参考图片:

  • 最多 6 张高保真对象图片(用于包含在最终图片中)
  • 最多 5 张人物图片(保持角色一致性)
from google import genai
from google.genai import types
from PIL import Image

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        "An office group photo of these people, they are making funny faces.",
        Image.open("person1.png"),
        Image.open("person2.png"),
        Image.open("person3.png"),
        Image.open("person4.png"),
        Image.open("person5.png"),
    ],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="5:4",
            image_size="2K",
        ),
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("office_group.png")

思考模式

Gemini 3 Pro Image 预览版会针对复杂提示使用推理流程("思考")。此功能默认启用且无法在 API 中停用。模型最多会生成两张临时"构思图片"来优化构图,最后输出最终的高质量图片。

查看思考过程:

for part in response.parts:
    if part.thought:
        if part.text:
            print("思考:", part.text)
        elif image := part.as_image():
            image.show()  # 临时构思图片
    else:
        if part.text:
            print(part.text)
        elif image := part.as_image():
            image.save("final.png")

思考签名

所有响应都包含 thought_signature 字段,这是模型内部思考过程的加密表示。在多轮对话中,如果您手动管理对话历史,需要将 thought_signature 原样传递回下一轮。使用官方 SDK 的聊天功能(client.chats.create)时,签名会被自动处理,无需手动管理。


Gemini 3.1 Flash Image 新增功能

Gemini 3.1 Flash Image (gemini-3.1-flash-image-preview) 是 Gemini 3 系列的高效版本,在性能、智能、成本和延迟之间实现了最佳平衡。

Google 图片搜索接地

仅 Gemini 3.1 Flash Image 支持。模型可以使用通过 Google 图片搜索检索到的网络图片作为视觉背景信息。

from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="A detailed painting of a Timareta butterfly resting on a flower",
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        tools=[
            types.Tool(google_search=types.GoogleSearch(
                search_types=types.SearchTypes(
                    web_search=types.WebSearch(),
                    image_search=types.ImageSearch()
                )
            ))
        ]
    ),
)

# 显示来源信息(如果可用)
if response.candidates and response.candidates[0].grounding_metadata:
    print(response.candidates[0].grounding_metadata.search_entry_point.rendered_content)

for part in response.parts:
    if image := part.as_image():
        image.save("butterfly.png")

图片来源展示要求

使用图片搜索接地时,您必须:

  • 以用户能够识别为链接的方式,提供指向包含来源图片的网页的链接
  • 如显示来源图片,必须提供从来源图片到其所在网页的直接点击路径

API 响应中的 groundingMetadata 包含: - imageSearchQueries:模型用于视觉上下文的具体查询 - groundingChunks:来源信息(包含 uri 着陆页和 image_uri 直接图片网址) - searchEntryPoint:符合展示要求的 HTML 和 CSS

控制思考等级

Gemini 3.1 Flash Image 允许控制模型使用的思考量,以平衡质量和延迟。默认级别为 minimal,支持 minimalhigh

from google import genai
from google.genai import types

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"base_url": "https://cdn.12ai.org"}
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="A futuristic city built inside a giant glass bottle floating in space",
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        thinking_config=types.ThinkingConfig(
            thinking_level="high",
            include_thoughts=True  # 是否返回思考过程
        ),
    ),
)

for part in response.parts:
    if part.thought:  # 跳过思考输出
        continue
    if image := part.as_image():
        image.save("city.png")

思考令牌计费

无论 include_thoughts 设置为 true 还是 false,思考令牌都会被计费,因为思考过程默认会进行。

512px 分辨率

Gemini 3.1 Flash Image 新增了较小的 512 像素 (0.5K) 分辨率选项,适合需要快速预览或低带宽场景。

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="A cute cat icon",
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",
            image_size="512px",
        ),
    ),
)

参考图片限制

Gemini 3.1 Flash Image 支持:

  • 最多 **10 张**高保真对象图片(用于包含在最终图片中)
  • 最多 **4 张**人物图片(保持角色一致性)

---

宽高比和分辨率

gemini-2.5-flash-image

宽高比 分辨率 令牌数
1:1 1024x1024 1290
2:3 832x1248 1290
3:2 1248x832 1290
3:4 864x1184 1290
4:3 1184x864 1290
4:5 896x1152 1290
5:4 1152x896 1290
9:16 768x1344 1290
16:9 1344x768 1290
21:9 1536x672 1290

gemini-3.1-flash-image-preview

宽高比 512px 分辨率 512px 令牌 1K 分辨率 1K 令牌 2K 分辨率 2K 令牌 4K 分辨率 4K 令牌
1:1 512x512 747 1024x1024 1120 2048x2048 1120 4096x4096 2000
1:4 256x1024 747 512x2048 1120 1024x4096 1120 2048x8192 2000
1:8 192x1536 747 384x3072 1120 768x6144 1120 1536x12288 2000
2:3 424x632 747 848x1264 1120 1696x2528 1120 3392x5056 2000
3:2 632x424 747 1264x848 1120 2528x1696 1120 5056x3392 2000
3:4 448x600 747 896x1200 1120 1792x2400 1120 3584x4800 2000
4:1 1024x256 747 2048x512 1120 4096x1024 1120 8192x2048 2000
4:3 600x448 747 1200x896 1120 2400x1792 1120 4800x3584 2000
4:5 464x576 747 928x1152 1120 1856x2304 1120 3712x4608 2000
5:4 576x464 747 1152x928 1120 2304x1856 1120 4608x3712 2000
8:1 1536x192 747 3072x384 1120 6144x768 1120 12288x1536 2000
9:16 384x688 747 768x1376 1120 1536x2752 1120 3072x5504 2000
16:9 688x384 747 1376x768 1120 2752x1536 1120 5504x3072 2000
21:9 792x168 747 1584x672 1120 3168x1344 1120 6336x2688 2000
---

提示技巧

逼真场景

使用摄影术语:拍摄角度、镜头类型、光线和细节。

A photorealistic close-up portrait of an elderly Japanese ceramicist
with deep wrinkles and a warm smile. Soft, golden hour light streaming
through a window. Captured with an 85mm portrait lens with soft bokeh.

风格化插画

明确说明样式:

A kawaii-style sticker of a happy red panda wearing a bamboo hat.
Bold, clean outlines, simple cel-shading, vibrant colors. White background.

准确的文字渲染

清楚说明文字内容和字体样式:

Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'.
Clean, bold, sans-serif font. Black and white color scheme.
Put the logo in a circle. Use a coffee bean in a clever way.

限制

  • 图片生成不支持音频或视频输入
  • 模型不一定会严格按照用户要求的图片输出数量生成图片
  • 参考图片限制:
  • gemini-2.5-flash-image:最多 3 张输入图片
  • gemini-3.1-flash-image-preview:最多 10 张对象图片 + 4 张人物图片,总共最多 14 张
  • gemini-3-pro-image-preview:最多 6 张对象图片 + 5 张人物图片,总共最多 14 张
  • 所有生成的图片都包含 SynthID 水印
  • 为图片生成文字时,建议先生成文字再要求生成包含该文字的图片
  • 推荐语言:英语、中文、日语、韩语、法语、德语、西班牙语、葡萄牙语、俄语、意大利语等

模型选择

模型 适用场景 特点
gemini-3.1-flash-image-preview 推荐首选,日常图片生成 性能/成本/延迟最佳平衡,支持图片搜索接地、思考等级控制、512px-4K 分辨率
gemini-3-pro-image-preview 专业素材制作、复杂指令 支持 4K、高级推理、搜索接地、14 张参考图
gemini-2.5-flash-image 大批量、低延迟任务 速度快,1024px 分辨率