GitHub - myshell-ai/MeloTTS: High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

https://github.com/myshell-ai/MeloTTS • Feb 27, 2024 06:01

Extracto

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. - myshell-ai/MeloTTS

Contenido

Introduction

MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai. Supported languages include:

Language	Example
English (American)	Link
English (British)	Link
English (Indian)	Link
English (Australian)	Link
English (Default)	Link
Spanish	Link
French	Link
Chinese (mix EN)	Link
Japanese	Link
Korean	Link

Some other features include:

The Chinese speaker supports mixed Chinese and English.
Fast enough for CPU real-time inference.

Install on Linux or macOS

git clone git+https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
pip install -e .
python -m unidic download

We welcome the open-source community to make this repo Windows compatible. If you find this repo useful, please consider contributing to the repo.

Usage

An unofficial live demo is hosted on Hugging Face Spaces.

WebUI

The WebUI supports muliple languages and voices. First, follow the installation steps. Then, simply run:

melo-ui
# Or: python melo/app.py

CLI

You may use the MeloTTS CLI to interact with MeloTTS. The CLI may be invoked using either melotts or melo. Here are some examples:

Read English text:

melo "Text to read" output.wav

Specify a language:

melo "Text to read" output.wav --language EN

Specify a speaker:

melo "Text to read" output.wav --language EN --speaker EN-US
melo "Text to read" output.wav --language EN --speaker EN-AU

The available speakers are: EN-Default, EN-US, EN-BR, EN-INDIA EN-AU.

Specify a speed:

melo "Text to read" output.wav --language EN --speaker EN-US --speed 1.5
melo "Text to read" output.wav --speed 1.5

Use a different language:

melo "text-to-speech 领域近年来发展迅速" zh.wav -l ZH

Load from a file:

melo file.txt out.wav --file

The full API documentation may be found using:

Python API

English with Multiple Accents

from melo.api import TTS

# Speed is adjustable
speed = 1.0

# CPU is sufficient for real-time inference.
# You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
device = 'auto' # Will automatically use GPU if available

# English 
text = "Did you ever hear a folk tale about a giant turtle?"
model = TTS(language='EN', device=device)
speaker_ids = model.hps.data.spk2id

# American accent
output_path = 'en-us.wav'
model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)

# British accent
output_path = 'en-br.wav'
model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)

# Indian accent
output_path = 'en-india.wav'
model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)

# Australian accent
output_path = 'en-au.wav'
model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)

# Default accent
output_path = 'en-default.wav'
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)

Spanish

from melo.api import TTS

# Speed is adjustable
speed = 1.0

# CPU is sufficient for real-time inference.
# You can also change to cuda:0
device = 'cpu'

text = "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante."
model = TTS(language='ES', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'es.wav'
model.tts_to_file(text, speaker_ids['ES'], output_path, speed=speed)

French

from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante."
model = TTS(language='FR', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'fr.wav'
model.tts_to_file(text, speaker_ids['FR'], output_path, speed=speed)

Chinese

from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "我最近在学习machine learning，希望能够在未来的artificial intelligence领域有所建树。"
model = TTS(language='ZH', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'zh.wav'
model.tts_to_file(text, speaker_ids['ZH'], output_path, speed=speed)

Japanese

from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "彼は毎朝ジョギングをして体を健康に保っています。"
model = TTS(language='JP', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'jp.wav'
model.tts_to_file(text, speaker_ids['JP'], output_path, speed=speed)

Korean

from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "안녕하세요! 오늘은 날씨가 정말 좋네요."
model = TTS(language='KR', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'kr.wav'
model.tts_to_file(text, speaker_ids['KR'], output_path, speed=speed)

License

This library is under MIT License, which means it is free for both commercial and non-commercial use.

Acknowledgements

This implementation is based on several excellent projects, TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work!