The Image Description Generator uses GPT-4o to return structured JSON output. Input details are defined in the toolConfig.ts file, and the output can be automatically displayed by the components/output/OutputLayout.tsx file, regardless of the JSON structure.

Image Description Generator using GPT-4o

Pre-requisites

To run the app, you must have Supabase and OpenAI set up. If you haven’t done this yet, please start there.

Storage

Set up audio, pdf and image storage using Cloudflare R2

That’s all you need to get the app running.

Review lib/types/toolconfig.ts to understand the various configuration fields in the demo app.

Building variations

The app is designed for easy customization and variation generation because of:

  • Automatic input capturing
  • Automatic output rendering, regardless of JSON structure

Prompt and JSON schema

Start by creating a new prompt and JSON schema for your app.

Use the existing prompts and JSON schemas for guidance. Follow the same principles and structure as in the demo prompts.

Your prompt.ts file should manage user input like this:

prompt.ts
export function generatePrompt(body: any) {
  const { descriptionType } = body;

  return (
    "Generate a detailed and engaging description for the provided image. The description should be informative, concise, and tailored to the specified type. Ensure the output is in valid JSON format and adheres strictly to the function schema.\n" +
    "INPUTS:\n" +
    `Description Type: ${descriptionType}\n` +

Take note of the input variables (descriptionType). We’ll use these later.

Automatic input capturing

Update toolConfig.ts to include:

  1. The input variables defined in the prompt
    • Input variables in prompt.ts should match name in toolConfig.ts. See example below.
  2. The button text for the form
  3. The type of model used

The InputCapture component will automatically include an upload form, upload it to Cloudflare R2, return an URL and send it to GPT-4o for analysis.

Ensure the type field in toolConfig.ts is specified correctly. InputCapture.tsx uses this to determine what to include (fields, file uploads) and which API endpoint to call.

The page.tsx page in /app folder of our demo app will get the data from toolConfig.ts and pass it to the InputCapture component to automatically build a form based on this.

Automatic output rendering

The goal is to allow rapid app generation, input capture, and output display for fast prototyping. Then, you can refine the output display for a more polished presentation.

app/(apps)/vision/[id]/page.tsx will automatically fetch data from Supabase based on the uuid & render the JSON, no matter the structure.

The OutputLayout component handles all the heavy lifting, automatically fetching and displaying the JSON. Review it for better understanding.

app/(apps)/vision/[id]/page.tsx
<OutputLayout params={params} toolConfig={toolConfig} />