Gemini API可以同时处理文本、语音、图片,我提供的内容里包含API的使用说明和样例参考,现在我需要用Gemini API做一个网页应用并部署到vercel上,具体功能是:用户进入网页,上传要编辑的图片并使用语音输入说明要编辑的具体内容,网页调用Gemini API实现图片编辑,并把结果展示出来,用户可以继续语音输入再次编辑修改后或之前的图片。指定模型为:gemini-2.0-flash-exp-image-generation


官方text generation说明文档,node.js语言: The Gemini API can generate text output when provided text, images, video, and audio as input. This guide shows you how to generate text using the generateContent and streamGenerateContent methods. To learn about working with Gemini's vision and audio capabilities, refer to the Vision and Audio guides. Generate text from text-only input The simplest way to generate text using the Gemini API is to provide the model with a single text-only input, as shown in this example: import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI("GEMINI_API_KEY");

const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

const prompt = "Explain how AI works";

const result = await model.generateContent(prompt); console.log(result.response.text());

In this case, the prompt ("Explain how AI works") doesn't include any output examples, system instructions, or formatting information. It's a zero-shot approach. For some use cases, a one-shot or few-shot prompt might produce output that's more aligned with user expectations. In some cases, you might also want to provide system instructions to help the model understand the task or follow specific guidelines. Generate text from text-and-image input The Gemini API supports multimodal inputs that combine text with media files. The following example shows how to generate text from text-and-image input: import { GoogleGenerativeAI } from "@google/generative-ai"; import * as fs from 'node:fs';

const genAI = new GoogleGenerativeAI("GEMINI_API_KEY"); const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

function fileToGenerativePart(path, mimeType) { return { inlineData: { data: Buffer.from(fs.readFileSync(path)).toString("base64"), mimeType, }, }; }

const prompt = "Describe how this product might be manufactured."; const imagePart = fileToGenerativePart("/path/to/image.png", "image/png");

const result = await model.generateContent([prompt, imagePart]); console.log(result.response.text());

Generate a text stream By default, the model returns a response after completing the entire text generation process. You can achieve faster interactions by not waiting for the entire result, and instead use streaming to handle partial results. The following example shows how to implement streaming using the streamGenerateContent method to generate text from a text-only input prompt. import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI("GEMINI_API_KEY"); const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

const prompt = "Explain how AI works";

const result = await model.generateContentStream(prompt);

for await (const chunk of result.stream) { const chunkText = chunk.text(); process.stdout.write(chunkText); }

Create a chat conversation The Gemini SDK lets you collect multiple rounds of questions and responses, allowing users to step incrementally toward answers or get help with multipart problems. This SDK feature provides an interface to keep track of conversations history, but behind the scenes uses the same generateContent method to create the response. The following code example shows a basic chat implementation: import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI("GEMINI_API_KEY"); const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" }); const chat = model.startChat({ history: [ { role: "user", parts: [{ text: "Hello" }], }, { role: "model", parts: [{ text: "Great to meet you. What would you like to know?" }], }, ], });

let result = await chat.sendMessage("I have 2 dogs in my house."); console.log(result.response.text()); let result2 = await chat.sendMessage("How many paws are in my house?"); console.log(result2.response.text());

You can also use streaming with chat, as shown in the following example: import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI("GEMINI_API_KEY"); const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

const chat = model.startChat({ history: [ { role: "user", parts: [{ text: "Hello" }], }, { role: "model", parts: [{ text: "Great to meet you. What would you like to know?" }], }, ], });

let result = await chat.sendMessageStream("I have 2 dogs in my house."); for await (const chunk of result.stream) { const chunkText = chunk.text(); process.stdout.write(chunkText); } let result2 = await chat.sendMessageStream("How many paws are in my house?"); for await (const chunk of result2.stream) { const chunkText = chunk.text(); process.stdout.write(chunkText); }