Skip to content

Quickstart: Gemini Multimodal Live + MultiAgentEval

Evaluate the latest Gemini agents using the harness for multimodal and reasoning benchmarks.

1. Setup Your Agent API

Expose your Gemini agent via an API that standardizes the input and output.

import google.generativeai as genai
from fastapi import FastAPI

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")

app = FastAPI()

@app.post("/execute_task")
async def execute(request: dict):
    # Standardize the prompt for Gemini
    response = model.generate_content(request["input"])
    return {"content": response.text}

2. Register the Agent

multiagent-eval evaluate --path scenarios/ --agent http://localhost:8000/execute_task --agent-name "Gemini-1.5-Pro-Live"

3. Generate Verified Report

multiagent-eval report --path runs/run.jsonl --share