Text to Speech
Set up your own text to speech app
We’ll use ElevenLabs, Cloudflare R2 storage & Supabase database to create our own Text to Speech app.
Pre-requisites
To build your own Text to Speech app you’ll need to have Supabase, ElevenLabs, and Storage set up. If you haven’t, please start by doing that.
Supabase
Set up user authentication & PostgreSQL database using Supabase
Storage
Set up audio storage using Cloudflare R2
ElevenLabs
Set up ElevenLabs & understand how it’s used throughout the app
That’s it - once the core infrastructure is ready, the app will be functional and you’ll be able to access it under the /voice
folder.
App Structure
-
app/api/(apps)/voice/*
Key API routes:app/api/(apps)/voice/models/route.ts
: Fetches available ElevenLabs models.app/api/(apps)/voice/voices/route.ts
: Fetches available ElevenLabs voices.app/api/(apps)/voice/text-to-speech/route.ts
: Generates speech from text using ElevenLabs, stores data in Supabase and reduces credits for the user.app/api/(apps)/voice/route/route.ts
: Uploads to Cloudflare R2.
-
/app/(apps)/voice/*
Contains all front-end logic, including paywall checks and dynamic pages. -
/components/voice/*
Contains all front-end components unique to the text-to-speech app.
Features
- Supports 45 voices by default, with access to 1,000+ additional voices from the ElevenLabs marketplace.
- Works in 26 languages.
- Requires user authentication.
- Uploads generated audio to Cloudflare R2 storage.
- Stores generation data in the ‘generations’ table in Supabase.
- Reduces user’s credits by 5 (configurable in
toolConfig.ts
) per generation.