VITS is a foundational research implementation of “VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech,” a well-known neural TTS architecture. Unlike traditional two-stage systems that separately train an acoustic model and a vocoder, VITS trains an end-to-end model that maps text directly to waveform using a conditional variational autoencoder combined with normalizing flows and adversarial training. This architecture enables parallel generation (fast inference) while achieving speech quality that rivals or surpasses many two-stage systems. The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. It also includes monotonic alignment search code and g2p preprocessing, which are crucial components for aligning text and speech in an end-to-end setup.

Features

  • End-to-end TTS model combining conditional VAE, normalizing flows, and adversarial training
  • Parallel waveform generation with high naturalness compared to classic two-stage pipelines
  • Ready-made training recipes for LJ Speech and VCTK datasets (single and multi-speaker)
  • Monotonic alignment search implementation and phoneme preprocessing scripts
  • PyTorch-based code suitable for research, modification, and experimental extensions
  • Widely adopted baseline architecture for many derivative and improved TTS systems

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

MIT License

Follow VITS

VITS Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of VITS!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

2025-11-28