← Back to Home

Providers

TTS provider setup

Version 0.2.1

Documentation

Providers Guide

Comprehensive documentation for all SpeakEasy TTS providers, including setup, configuration, and provider-specific features.

Overview

SpeakEasy supports four TTS providers, each with unique characteristics:

ProviderTypeAPI Key RequiredVoicesQualitySpeed
SystemBuilt-in❌ NomacOS voicesGoodFast
OpenAIAPI✅ Yes6 voicesHighMedium
ElevenLabsAPI✅ YesCustomVery HighMedium
GroqAPI✅ Yes6 voicesHighVery Fast

System Voice (macOS)

Overview

  • Built-in: Uses macOS say command
  • No API key: Works immediately on macOS
  • Direct file generation: Creates .aiff files then plays with afplay
  • Volume control: Isolated from system volume

Configuration

Global config:

json
{
  "providers": {
    "system": {
      "enabled": true,
      "voice": "Samantha"
    }
  }
}

SDK usage:

typescript
const speaker = new SpeakEasy({
  provider: 'system',
  systemVoice: 'Samantha',
  rate: 180,
  volume: 0.7
});

CLI usage:

bash
speakeasy "Hello world" --provider system --voice Samantha --rate 200

Available Voices

Popular voices:

  • Samantha - Default female voice (US English)
  • Alex - Male voice (US English)
  • Victoria - Female voice (US English)
  • Daniel - Male voice (British English)
  • Karen - Female voice (Australian English)
  • Moira - Female voice (Irish English)
  • Tessa - Female voice (South African English)

List all voices:

bash
say -v ?

Rate Control

System voice uses direct WPM (words per minute) control:

  • Range: 80-400 WPM
  • Default: 180 WPM
  • Implementation: Direct say -r parameter
bash
# Slow speech
speakeasy "Slow speech" --provider system --rate 120

# Fast speech  
speakeasy "Fast speech" --provider system --rate 250

Advantages

  • No API key required
  • Fast response (no network calls)
  • Reliable (always available on macOS)
  • Good quality built-in voices
  • No caching needed (already fast)

Limitations

  • macOS only (requires say command)
  • Limited voices compared to API providers
  • No custom voices

OpenAI TTS

Overview

  • API-based: Uses OpenAI TTS-1 model
  • High quality: Neural text-to-speech
  • 6 voices: Diverse voice options
  • Caching enabled: Automatic for repeated text

Setup

API key required:

bash
export OPENAI_API_KEY="sk-..."

Global config:

json
{
  "providers": {
    "openai": {
      "enabled": true,
      "voice": "nova",
      "model": "tts-1",
      "apiKey": "sk-..."
    }
  }
}

Available Voices

VoiceDescriptionCharacteristics
alloyNeutralBalanced, professional
echoMaleClear, authoritative
fableExpressiveStorytelling, engaging
onyxDeep MaleRich, commanding
novaFemaleWarm, friendly (default)
shimmerBright FemaleEnergetic, upbeat

Usage Examples

SDK:

typescript
const speaker = new SpeakEasy({
  provider: 'openai',
  openaiVoice: 'nova',
  rate: 200,
  apiKeys: {
    openai: process.env.OPENAI_API_KEY
  }
});

await speaker.speak('Hello from OpenAI TTS');

CLI:

bash
speakeasy "Hello world" --provider openai --voice nova
speakeasy "Professional voice" --provider openai --voice alloy --rate 180

Rate Control

OpenAI uses a speed parameter (0.25-4.0), converted from WPM:

Conversion formula:

speed = rate / 200

Examples:

  • 100 WPM → 0.5 speed
  • 200 WPM → 1.0 speed (normal)
  • 400 WPM → 2.0 speed

Rate bounds: 50-800 WPM (0.25-4.0 speed)

Models

  • tts-1: Standard quality, faster generation
  • tts-1-hd: Higher quality, slower generation
json
{
  "providers": {
    "openai": {
      "model": "tts-1-hd"
    }
  }
}

Advantages

  • High quality neural voices
  • Diverse voices with distinct characteristics
  • Reliable API with good uptime
  • Automatic caching for repeated text
  • Wide platform support

Limitations

  • API key required (paid service)
  • Network dependent (can fail offline)
  • Rate limits (API quotas)
  • No custom voices (fixed set of 6)

Error Handling

typescript
try {
  await say('Hello', 'openai');
} catch (error) {
  if (error.message.includes('API key')) {
    console.error('Set OPENAI_API_KEY environment variable');
  } else if (error.message.includes('rate limit')) {
    console.error('API rate limit exceeded');
  }
}

ElevenLabs

Overview

  • Premium quality: Advanced voice synthesis
  • Custom voices: Create and clone voices
  • Emotional control: Stability and similarity settings
  • Professional grade: Highest quality output

Setup

API key required:

bash
export ELEVENLABS_API_KEY="..."

Global config:

json
{
  "providers": {
    "elevenlabs": {
      "enabled": true,
      "voiceId": "EXAVITQu4vr4xnSDxMaL",
      "modelId": "eleven_monolingual_v1",
      "apiKey": "..."
    }
  }
}

Voice Configuration

Default voice ID:

json
{
  "providers": {
    "elevenlabs": {
      "voiceId": "EXAVITQu4vr4xnSDxMaL"
    }
  }
}

Custom voice ID:

bash
speakeasy "Hello" --provider elevenlabs --voice "your-custom-voice-id"

Usage Examples

SDK:

typescript
const speaker = new SpeakEasy({
  provider: 'elevenlabs',
  elevenlabsVoiceId: 'EXAVITQu4vr4xnSDxMaL',
  rate: 180,
  apiKeys: {
    elevenlabs: process.env.ELEVENLABS_API_KEY
  }
});

CLI:

bash
speakeasy "Premium voice" --provider elevenlabs
speakeasy "Custom voice" --provider elevenlabs --voice "custom-voice-id"

Rate Control

ElevenLabs doesn't have a direct rate parameter. Rate control is simulated through:

  • Text preprocessing
  • Punctuation insertion for pacing
  • Future: API-level rate control when available

Available Models

  • eleven_monolingual_v1 - English optimized
  • eleven_multilingual_v1 - Multiple languages
  • eleven_multilingual_v2 - Latest multilingual

Advantages

  • Highest quality voices available
  • Custom voice cloning capability
  • Emotional expression controls
  • Professional results for production use
  • Multiple languages support

Limitations

  • Most expensive provider
  • Custom voice ID required (not human-readable names)
  • Limited rate control (no native speed parameter)
  • Higher latency due to processing complexity

Finding Voice IDs

ElevenLabs Dashboard:

  1. Go to https://elevenlabs.io/app/voice-lab
  2. Select voice
  3. Copy voice ID from URL or settings

Voice ID format:

EXAVITQu4vr4xnSDxMaL  // Example format

Groq

Overview

  • Fast inference: Optimized for speed
  • OpenAI compatible: Same voice names and API
  • Cost effective: Competitive pricing
  • High performance: Excellent speed/quality balance

Setup

API key required:

bash
export GROQ_API_KEY="gsk_..."

Global config:

json
{
  "providers": {
    "groq": {
      "enabled": true,
      "voice": "nova",
      "model": "tts-1",
      "apiKey": "gsk_..."
    }
  }
}

Available Voices

Uses OpenAI-compatible voice names:

  • alloy, echo, fable, onyx, nova, shimmer

Usage Examples

SDK:

typescript
const speaker = new SpeakEasy({
  provider: 'groq',
  rate: 220, // Groq handles fast generation well
  apiKeys: {
    groq: process.env.GROQ_API_KEY
  }
});

CLI:

bash
speakeasy "Fast generation" --provider groq --voice nova

Rate Control

Similar to OpenAI:

speed = rate / 200

Advantages

  • Very fast inference and generation
  • Cost effective compared to other APIs
  • OpenAI compatible voice names
  • Good quality output
  • Reliable performance

Limitations

  • Newer provider (less established)
  • API key required
  • Limited voice selection (6 voices)

Provider Comparison

Quality Ranking

  1. ElevenLabs - Premium, custom voices
  2. OpenAI - High quality, diverse voices
  3. Groq - Good quality, fast generation
  4. System - Good quality, built-in voices

Speed Ranking

  1. System - Instant (local)
  2. Groq - Very fast API
  3. OpenAI - Medium speed API
  4. ElevenLabs - Slower (high quality processing)

Cost Ranking (Free to Expensive)

  1. System - Free (built-in)
  2. Groq - Cost effective
  3. OpenAI - Standard pricing
  4. ElevenLabs - Premium pricing

Fallback Strategy

Automatic Fallbacks

SpeakEasy automatically falls back between providers:

json
{
  "defaults": {
    "fallbackOrder": ["openai", "groq", "system"]
  }
}

Fallback triggers:

  • API key missing
  • Network failure
  • Rate limit exceeded
  • Provider service down

Custom Fallback

typescript
async function reliableSpeech(text: string) {
  const providers = ['openai', 'elevenlabs', 'system'];
  
  for (const provider of providers) {
    try {
      await say(text, provider as any);
      return; // Success
    } catch (error) {
      console.warn(`${provider} failed, trying next...`);
    }
  }
  
  throw new Error('All providers failed');
}

Provider Selection Guide

Choose System when:

  • ✅ macOS environment
  • ✅ No API costs desired
  • ✅ Fast, reliable speech needed
  • ✅ Basic quality sufficient

Choose OpenAI when:

  • ✅ High quality needed
  • ✅ Diverse voice options wanted
  • ✅ Established API preferred
  • ✅ Good documentation required

Choose ElevenLabs when:

  • ✅ Highest quality required
  • ✅ Custom voices needed
  • ✅ Professional production use
  • ✅ Budget allows premium pricing

Choose Groq when:

  • ✅ Speed is critical
  • ✅ Cost effectiveness important
  • ✅ Good quality sufficient
  • ✅ Fast iteration needed

Troubleshooting Providers

Common Issues

System voice not working:

bash
# Check if say command exists
which say

# Test system voice directly
say "Hello world"

# Check voice availability
say -v ?

API key issues:

bash
# Check environment variables
env | grep -i api_key

# Test API key format
echo $OPENAI_API_KEY | head -c 10  # Should show "sk-"
echo $GROQ_API_KEY | head -c 4     # Should show "gsk_"

Network/API failures:

bash
# Test with debug mode
speakeasy "test" --provider openai --debug

# Try fallback providers
speakeasy "test" --provider system  # Always works on macOS

Provider Health Check

bash
# Run comprehensive diagnostics
speakeasy --doctor

# Provider-specific testing
speakeasy "test system" --provider system
speakeasy "test openai" --provider openai
speakeasy "test elevenlabs" --provider elevenlabs
speakeasy "test groq" --provider groq

For detailed configuration options, see Configuration Guide. For troubleshooting help, see Troubleshooting Guide.