GitHub - myshell-ai/MeloTTS: High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Extracto
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. - myshell-ai/MeloTTS
Contenido
Introduction
MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai. Supported languages include:
| Language | Example |
|---|---|
| English (American) | Link |
| English (British) | Link |
| English (Indian) | Link |
| English (Australian) | Link |
| English (Default) | Link |
| Spanish | Link |
| French | Link |
| Chinese (mix EN) | Link |
| Japanese | Link |
| Korean | Link |
Some other features include:
- The Chinese speaker supports
mixed Chinese and English. - Fast enough for
CPU real-time inference.
Install on Linux or macOS
git clone git+https://github.com/myshell-ai/MeloTTS.git cd MeloTTS pip install -e . python -m unidic download
We welcome the open-source community to make this repo Windows compatible. If you find this repo useful, please consider contributing to the repo.
Usage
An unofficial live demo is hosted on Hugging Face Spaces.
WebUI
The WebUI supports muliple languages and voices. First, follow the installation steps. Then, simply run:
melo-ui
# Or: python melo/app.pyCLI
You may use the MeloTTS CLI to interact with MeloTTS. The CLI may be invoked using either melotts or melo. Here are some examples:
Read English text:
melo "Text to read" output.wavSpecify a language:
melo "Text to read" output.wav --language ENSpecify a speaker:
melo "Text to read" output.wav --language EN --speaker EN-US melo "Text to read" output.wav --language EN --speaker EN-AU
The available speakers are: EN-Default, EN-US, EN-BR, EN-INDIA EN-AU.
Specify a speed:
melo "Text to read" output.wav --language EN --speaker EN-US --speed 1.5 melo "Text to read" output.wav --speed 1.5
Use a different language:
melo "text-to-speech 领域近年来发展迅速" zh.wav -l ZHLoad from a file:
melo file.txt out.wav --file
The full API documentation may be found using:
Python API
English with Multiple Accents
from melo.api import TTS # Speed is adjustable speed = 1.0 # CPU is sufficient for real-time inference. # You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps' device = 'auto' # Will automatically use GPU if available # English text = "Did you ever hear a folk tale about a giant turtle?" model = TTS(language='EN', device=device) speaker_ids = model.hps.data.spk2id # American accent output_path = 'en-us.wav' model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed) # British accent output_path = 'en-br.wav' model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed) # Indian accent output_path = 'en-india.wav' model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed) # Australian accent output_path = 'en-au.wav' model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed) # Default accent output_path = 'en-default.wav' model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)
Spanish
from melo.api import TTS # Speed is adjustable speed = 1.0 # CPU is sufficient for real-time inference. # You can also change to cuda:0 device = 'cpu' text = "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante." model = TTS(language='ES', device=device) speaker_ids = model.hps.data.spk2id output_path = 'es.wav' model.tts_to_file(text, speaker_ids['ES'], output_path, speed=speed)
French
from melo.api import TTS # Speed is adjustable speed = 1.0 device = 'cpu' # or cuda:0 text = "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante." model = TTS(language='FR', device=device) speaker_ids = model.hps.data.spk2id output_path = 'fr.wav' model.tts_to_file(text, speaker_ids['FR'], output_path, speed=speed)
Chinese
from melo.api import TTS # Speed is adjustable speed = 1.0 device = 'cpu' # or cuda:0 text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。" model = TTS(language='ZH', device=device) speaker_ids = model.hps.data.spk2id output_path = 'zh.wav' model.tts_to_file(text, speaker_ids['ZH'], output_path, speed=speed)
Japanese
from melo.api import TTS # Speed is adjustable speed = 1.0 device = 'cpu' # or cuda:0 text = "彼は毎朝ジョギングをして体を健康に保っています。" model = TTS(language='JP', device=device) speaker_ids = model.hps.data.spk2id output_path = 'jp.wav' model.tts_to_file(text, speaker_ids['JP'], output_path, speed=speed)
Korean
from melo.api import TTS # Speed is adjustable speed = 1.0 device = 'cpu' # or cuda:0 text = "안녕하세요! 오늘은 날씨가 정말 좋네요." model = TTS(language='KR', device=device) speaker_ids = model.hps.data.spk2id output_path = 'kr.wav' model.tts_to_file(text, speaker_ids['KR'], output_path, speed=speed)
License
This library is under MIT License, which means it is free for both commercial and non-commercial use.
Acknowledgements
This implementation is based on several excellent projects, TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work!
Fuente: GitHub
