Open-source framework for intelligent speech interaction
A text-to-speech, speech-to-text and speech-to-speech library
Large Audio Language Model built for natural interactions
Multi-modal large language model designed for audio understanding
Tokenizer-Free TTS for Multilingual Speech Generation
Controllable & emotion-expressive zero-shot TTS
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Framework for building real-time voice and multimodal AI agents
Transforming Multimodal Content into Captivating Multilingual Audio
Capable of understanding text, audio, vision, video
Translate the video from one language to another and embed dubbing
A Systematic Framework for Interactive World Modeling
Offline Text To Speech synthesis for python
Open Source Speech Language Model
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
A fast TTS architecture with conditional flow matching
Industrial-level controllable zero-shot text-to-speech system
Synchronized Translation for Videos
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Instant voice cloning by MIT and MyShell. Audio foundation model
Interface for OuteTTS models
A TTS model capable of generating ultra-realistic dialogue
A sound cloning tool with a web interface, using your voice
A high-quality rapid TTS voice cloning model
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model