We’ll use ElevenLabs, Cloudflare R2 storage & Supabase database to create our own Text to Speech app.

Pre-requisites

To build your own Text to Speech app you’ll need to have Supabase, ElevenLabs, and Storage set up. If you haven’t, please start by doing that.

That’s it - once the core infrastructure is ready, the app will be functional and you’ll be able to access it under the /voice folder.

App Structure

  1. app/api/voice/* Key API routes:

    • app/api/voice/models/route.ts: Fetches available ElevenLabs models.
    • app/api/voice/voices/route.ts: Fetches available ElevenLabs voices.
    • app/api/voice/text-to-speech/route.ts: Generates speech from text using ElevenLabs, stores data in Supabase and reduces credits for the user.
    • app/api/voice/route/route.ts: Uploads to Cloudflare R2.
  2. /app/voice/* Contains all front-end logic, including paywall checks and dynamic pages.

  3. /components/voice/* Contains all front-end components unique to the text-to-speech app.

Features

  • Supports 45 voices by default, with access to 1,000+ additional voices from the ElevenLabs marketplace.
  • Works in 26 languages.
  • Requires user authentication.
  • Uploads generated audio to Cloudflare R2 storage.
  • Stores generation data in the ‘generations’ table in Supabase.
  • Reduces user’s credits by 5 (configurable in toolConfig.ts) per generation.