Absortio

Email → Summary → Bookmark → Email

GitHub - robdmac/talkito: TalkiTo lets developers interact with AI systems through speech across multiple channels (terminal, API, phone). It can be used as both a command-line tool and a Python library.

Extracto

TalkiTo lets developers interact with AI systems through speech across multiple channels (terminal, API, phone). It can be used as both a command-line tool and a Python library. - robdmac/talkito

Resumen

Resumen Principal

TalkiTo es una herramienta innovadora y multifacética diseñada para desarrolladores, permitiéndoles interactuar de manera fluida y multimodal con asistentes de IA avanzados como Claude Code y OpenAI Codex. Su valor central reside en la capacidad de unificar la comunicación con estas inteligencias artificiales a través de diversos canales y formatos, incluyendo una interfaz de línea de comandos (CLI), una extensión web, y una biblioteca Python. Esto posibilita a los desarrolladores hablar, chatear por Slack o WhatsApp con sus asistentes de codificación, transformando la manera en que se desarrollan y colaboran. La herramienta se distingue por su amplia compatibilidad con múltiples proveedores de Text-to-Speech (TTS) y Automatic Speech Recognition (ASR), ofreciendo una personalización exhaustiva para adaptar la experiencia de voz a las preferencias del usuario. Adicionalmente, integra funcionalidades de comunicación remota, extendiendo la utilidad de los asistentes de IA a entornos de colaboración y monitoreo externo, consolidando a TalkiTo como un habilitador clave para flujos de trabajo de desarrollo modernos y eficientes.

Elementos Clave

  • Plataforma de Interacción Multimodal: TalkiTo ofrece una gran versatilidad al permitir la interacción con IA de código a través de múltiples interfaces, incluyendo una herramienta de línea de comandos, una extensión web para integración directa en navegadores, y una biblioteca Python para uso programático. Esta flexibilidad asegura que los desarrolladores puedan integrar la comunicación con Claude Code y OpenAI Codex directamente en sus flujos de trabajo preferidos, ya sea para scripting, automatización o interacción directa.
  • **Soporte Extenso de Proveedores de Voz y Reconocimiento

Contenido

TalkiTo

GitHub Stars GitHub Forks License Discord

TalkiTo lets developers talk, slack and whatsapp with Claude Code and OpenAI Codex. It can be used as a command-line tool, a web extension, and as a Python library.

🚀 Quick Install

Option 1: One-liner Install Script (Recommended)

curl -sSL https://raw.githubusercontent.com/robdmac/talkito/main/install.sh | bash

Option 2: PyPI

Then just run:

Install for End Users

From Source (Stable)

# Clone the repository
git clone https://github.com/robdmac/talkito.git
cd talkito

# Create and activate virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install system dependencies (macOS)
brew install portaudio

# Install package (normal install - gets updates via git pull)
pip install .

# Run this in a directory you want to use claude with
talkito claude

Install for Developers

Editable Install (Development)

# Clone the repository
git clone https://github.com/robdmac/talkito.git
cd talkito

# Create and activate virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install system dependencies (macOS)
brew install portaudio

# Install in development mode (editable install)
pip install -e .

# Run this in a directory you want to use claude with
talkito claude

or for the web extension run as

then go to chrome://extensions/ and load unpacked the extensions/chrome/ dir

Demo Video

TalkiTo Demo

AI Assistant Compatibility

AI Assistant Method Status
Claude Code Terminal Fully Supported
Codex Cli Terminal Fully Supported
bolt.new Web Extension Output Only
v0.dev Web Extension Output Only
replit.com Web Extension Output Only
Other agents Terminal In Progress

Run with Claude Code

run talkito claude

Run with Codex Cli

run talkito codex

Run as an MCP server

run talkito --mcp-server

Run the TalkiTo configuration menu

run talkito

Advanced Options

# Disable auto-skip to newer content (auto-skip is on by default)
talkito --dont-auto-skip-tts claude

# Use different TTS providers
talkito --tts-provider polly --tts-voice Matthew --tts-region us-west-2 echo "Hello with AWS"
talkito --tts-provider azure --tts-voice en-US-JennyNeural echo "Hello with Azure"
talkito --tts-provider gcloud --tts-voice en-US-Journey-F echo "Hello with Google"
talkito --tts-provider kittentts --tts-voice expr-voice-3-f echo "Hello with KittenTTS"
talkito --tts-provider kokoro --tts-voice af_heart echo "Hello with Kokoro (local)"

# Use different ASR providers
talkito --asr-provider gcloud --asr-language en-US claude
AZURE_SPEECH_KEY=... AZURE_SPEECH_REGION=eastus talkito --asr-provider azure claude
WHISPER_MODEL=small WHISPER_COMPUTE_TYPE=int8 talkito --asr-provider local_whisper claude
talkito --asr-language es-ES echo "Hola mundo"  # Spanish recognition

# Enable remote communication (configure via environment variables)
talkito --slack-channel '#alerts' python manage.py runserver
talkito --whatsapp-recipients +1234567890 long-running-command
talkito --sms-recipients +1234567890,+0987654321 server-monitor.sh

Using tts.py (Standalone TTS)

The TTS module can be used independently for text-to-speech operations:

#!/usr/bin/env python3
import tts

# Initialize TTS
engine = tts.detect_tts_engine()
tts.start_tts_worker(engine)

# Speak text
tts.queue_for_speech("Hello from the TTS module!")

# Wait and cleanup
import time
time.sleep(2)
tts.shutdown_tts()

Using asr.py (Standalone ASR)

The ASR module can be used independently for speech recognition:

#!/usr/bin/env python3
import asr

# Define callback for recognized text
def handle_text(text):
    print(f"You said: {text}")

# Start dictation
asr.start_dictation(handle_text)

# Keep running (press Ctrl+C to stop)
try:
    import time
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    asr.stop_dictation()

Provider Configuration

Text-to-Speech (TTS) Providers

System TTS (Default)

  • macOS: Uses built-in say command
  • Linux: Uses espeak, festival, or flite (install via package manager)
  • Setup: No API key needed

OpenAI TTS

AWS Polly

Azure Speech Services

Google Cloud Text-to-Speech

ElevenLabs

  • Get API Key: https://elevenlabs.io/
  • Setup: Set ELEVENLABS_API_KEY
  • Voices: Various voice IDs available
  • Usage: Configure in code or .env file

Deepgram

  • Get API Key: https://deepgram.com/
  • Setup: Set DEEPGRAM_API_KEY
  • Voices: aura-asteria-en, aura-luna-en, aura-stella-en, and more
  • Usage: --tts-provider deepgram --tts-voice aura-asteria-en

KittenTTS (Local / Offline)

  • Install: pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl soundfile phonemizer
  • Setup: No API key required. First run prompts to download the selected model (default kitten-tts-nano-0.2) into the Hugging Face cache. Configure KITTENTTS_MODEL and KITTENTTS_VOICE to pick different quality/voice options.
  • Best for: Ultra-lightweight CPU-only voices that stay on-device.
  • Usage: KITTENTTS_MODEL=kitten-tts-nano-0.2 talkito --tts-provider kittentts --tts-voice expr-voice-3-f

Kokoro (Local / Offline)

  • Install: pip install 'kokoro>=0.9.4' soundfile phonemizer
  • Setup: No API key required. TalkiTo will download Kokoro weights the first time you run it (set KOKORO_LANGUAGE, KOKORO_VOICE, KOKORO_SPEED to control defaults).
  • Best for: High-quality multilingual voices without sending audio to a cloud provider.
  • Usage: talkito --tts-provider kokoro --tts-voice af_heart --tts-language en-US

Automatic Speech Recognition (ASR) Providers

Google Speech Recognition (Default)

  • Free: No API key required
  • Limitations: Best for short utterances, requires internet
  • Usage: Default when no provider specified

Google Cloud Speech-to-Text

AssemblyAI

  • Get API Key: https://www.assemblyai.com/
  • Setup: Set ASSEMBLYAI_API_KEY
  • Features: Real-time transcription, speaker detection
  • Usage: Configure in code or .env file

Deepgram

  • Get API Key: https://deepgram.com/
  • Setup: Set DEEPGRAM_API_KEY
  • Features: Fast, accurate real-time transcription
  • Usage: Configure in code or .env file

Houndify

  • Get Credentials: https://www.houndify.com/
  • Setup: Set HOUNDIFY_CLIENT_ID and HOUNDIFY_CLIENT_KEY
  • Features: Natural language understanding
  • Usage: --asr-provider houndify

AWS Transcribe

Azure Speech Services

Local Whisper (On-Device)

  • Install: pip install faster-whisper (default) or WHISPER_COREML=1 pip install pywhispercpp for Apple Silicon/CoreML acceleration
  • Setup: No API key required. Configure WHISPER_MODEL (e.g., small, medium), WHISPER_DEVICE (cpu, cuda, or mps), and WHISPER_COMPUTE_TYPE (int8, int8_float16, etc.). Models are cached locally and TalkiTo will prompt before downloading unless TALKITO_AUTO_APPROVE_DOWNLOADS=1.
  • Usage: WHISPER_MODEL=small WHISPER_COMPUTE_TYPE=int8 talkito --asr-provider local_whisper

Communication Providers (Remote Interaction)

Twilio SMS

  • Get Account: https://www.twilio.com/try-twilio
  • Setup: Set TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER you will need to a verified number to avoid being filtered.
  • Features: Send command output via SMS, receive input via SMS
  • Usage: --sms-recipients +1234567890

Twilio WhatsApp

  • Get Started: https://www.twilio.com/whatsapp
  • Setup Instructions: Run talkito --setup-whatsapp for detailed setup guide
  • Required Environment Variables:
    • TWILIO_ACCOUNT_SID: Your Twilio account SID
    • TWILIO_AUTH_TOKEN: Your Twilio auth token
    • TWILIO_WHATSAPP_NUMBER: Twilio's WhatsApp number (usually +14155238886)
    • WHATSAPP_RECIPIENTS: Your WhatsApp number
    • ZROK_RESERVED_TOKEN: Your zrok reserved share token
  • Quick Setup:
    • Join Twilio WhatsApp Sandbox at https://www.twilio.com/console/sms/whatsapp/sandbox
    • Send the join code via WhatsApp to +1 415 523 8886
    • Install zrok and create a reserved share: zrok reserve public http://localhost:8080
    • Set webhook URL in Twilio Console to: https://YOUR-TOKEN.share.zrok.io/whatsapp
  • Usage: --whatsapp-recipients +1234567890

Slack

  • Create App: https://api.slack.com/apps
  • Setup: Set SLACK_BOT_TOKEN and optionally SLACK_APP_TOKEN
  • Features: Send output to channels, receive commands
  • Usage: --slack-channel '#channel-name'

Environment Configuration

Talkito supports two environment files:

  • .env - Primary configuration (takes precedence)
  • .talkito.env - Secondary configuration (won't override .env)

Copy .env.example to .env and add your API keys:

cp .env.example .env
# Edit .env with your API keys

For WhatsApp setup with zrok tunneling:

  • ZROK_RESERVED_TOKEN: Your zrok reserved share token for webhook tunneling

Requirements

  • Python 3.10+
  • macOS (with say command) or Linux (with espeak, festival, or flite)
  • Optional: SpeechRecognition and pyaudio for ASR support
  • Optional: Provider-specific Python packages (installed as needed)

Contributing

See CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the GNU Affero General Public License v3.0 or later - see the LICENSE file for details.

Copyright (C) 2025 Robert Macrae

Fuente: GitHub