Multimodal Content Generation with Gemini on Vertex AI (Solution)

Overview

Labs are timed and cannot be paused. The timer starts when you click Start Lab.
The included IDE is preconfigured with the gcloud SDK.
Use the terminal to execute commands and then click Check my progress to verify your work.

Challenge scenario

Scenario: You are working on a project where you need to generate descriptive content based on images and text prompts. You’re utilizing Google Cloud's Vertex AI, specifically a generative model that can process multimodal inputs (i.e., a mix of text and images) to produce creative or informative outputs. Your task is to create a Python script that loads images from Cloud Storage, integrates them with text prompts, and uses a generative model to generate content based on these inputs.

Task: Develop a Python function named load_image_from_url(prompt). This function should invoke the gemini-2.0-flash model using the supplied prompt, generate the response.

Follow these steps to interact with the Generative AI APIs using Vertex AI Python SDK.

Click File > New File to open a new file within the Code Editor.
Write the Python code to use Google's Vertex AI SDK to interact with the pre-trained Text Generation AI model.
Create and save the python file.
Execute the Python file by invoking the below command by replacing the FILE_NAME inside the terminal within the Code Editor pane to view the output.

/usr/bin/python3 /FILE_NAME.py

Click Check my progress to verify the objective.

Send a multimodal prompt to Gen AI and receive a response

Solution of Lab

https://youtu.be/lGIXcw9jt9k

Create a file main.py

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# -----------------------------
# CONFIG
# -----------------------------
PROJECT_ID = "YOUR_PROJECT_ID"   # replace
LOCATION = "YOUR_REGION"

vertexai.init(project=PROJECT_ID, location=LOCATION)

def load_image_from_url(prompt):
    """
    Generates content using Gemini 2.0 Flash
    with image + text input
    """

    model = GenerativeModel("gemini-2.0-flash")

    image = Part.from_uri(
        uri="gs://cloud-samples-data/vision/landmark/eiffel_tower.jpg",
        mime_type="image/jpeg"
    )

    response = model.generate_content(
        [image, prompt]
    )

    return response.text


if __name__ == "__main__":
    prompt = "Describe this image in detail and explain what makes it unique."

    print("Prompt:", prompt)
    print("\nModel Response:\n")

    try:
        print(load_image_from_url(prompt))
    except Exception as e:
        print("Error:", e)

/usr/bin/python3 /main.py

David Nguyen

David Nguyen

Multimodal Content Generation with Gemini on Vertex AI (Solution)

Table of Contents

Overview

Challenge scenario

Solution of Lab