GPT-4o & vision

The Image Description Generator uses GPT-4o to return structured JSON output. Input details are defined in the toolConfig.ts file, and the output can be automatically displayed by the components/output/OutputLayout.tsx file, regardless of the JSON structure.

Image Description Generator using GPT-4o

Pre-requisites

To run the app, you must have Supabase and OpenAI set up. If you haven’t done this yet, please start there.

Supabase

Set up user authentication & PostgreSQL database using Supabase

OpenAI

Set up OpenAI & use it’s various models throughout your app

Storage

Set up audio, pdf and image storage using Cloudflare R2

That’s all you need to get the app running.

Review lib/types/toolconfig.ts to understand the various configuration fields in the demo app.

Building variations

The app is designed for easy customization and variation generation because of:

Automatic input capturing
Automatic output rendering, regardless of JSON structure

Prompt and JSON schema

Start by creating a new prompt and JSON schema for your app. Use the existing prompts and JSON schemas for guidance. Follow the same principles and structure as in the demo prompts. Your prompt.ts file should manage user input like this:

prompt.ts

export function generatePrompt(body: any) {
  const { descriptionType } = body;

  return (
    "Generate a detailed and engaging description for the provided image. The description should be informative, concise, and tailored to the specified type. Ensure the output is in valid JSON format and adheres strictly to the function schema.\n" +
    "INPUTS:\n" +
    `Description Type: ${descriptionType}\n` +

Take note of the input variables (descriptionType). We’ll use these later.

Automatic input capturing

Update toolConfig.ts to include:

The input variables defined in the prompt
- Input variables in prompt.ts should match name in toolConfig.ts. See example below.
The button text for the form
The type of model used

The InputCapture component will automatically include an upload form, upload it to Cloudflare R2, return an URL and send it to GPT-4o for analysis.

Ensure the type field in toolConfig.ts is specified correctly. InputCapture.tsx uses this to determine what to include (fields, file uploads) and which API endpoint to call.

The page.tsx page in /app folder of our demo app will get the data from toolConfig.ts and pass it to the InputCapture component to automatically build a form based on this.

  type: "vision", // options: 'vision' for GPT-4o, 'dalle', 'sdxl', 'groq' & 'gpt'.
  fields: [
    {
      label: "📝 Description Type",
      name: "descriptionType",
      type: "select",
      options: [
        "Short and concise",
        "Detailed and descriptive",
        "Humorous and creative",
      ],
      required: true,
    },
  ],
  submitText: "Generate image description 🌄",
  submitTextGenerating: "Analyzing your image...",
  responseTitle: "Your image description has been generated",
  responseSubTitle:
    "The output below has been automatically rendered based on the JSON schema used by the AI model. You can use this to quickly prototype your application.",

Automatic output rendering

The goal is to allow rapid app generation, input capture, and output display for fast prototyping. Then, you can refine the output display for a more polished presentation.

app/(apps)/vision/[id]/page.tsx will automatically fetch data from Supabase based on the uuid & render the JSON, no matter the structure. The OutputLayout component handles all the heavy lifting, automatically fetching and displaying the JSON. Review it for better understanding.

app/(apps)/vision/[id]/page.tsx

<OutputLayout params={params} toolConfig={toolConfig} />

Get started

Core services

AI providers

Demo apps

Pre-requisites

Supabase

OpenAI

Storage

Building variations

Prompt and JSON schema

Automatic input capturing

Automatic output rendering

Get started

Core services

AI providers

Demo apps

​Pre-requisites

Supabase

OpenAI

Storage

​Building variations

​Prompt and JSON schema

​Automatic input capturing

​Automatic output rendering

Pre-requisites

Building variations

Prompt and JSON schema

Automatic input capturing

Automatic output rendering