egrecho.utils.text.cleaners#

This code is modified from Nemo and coqui.

egrecho.utils.text.cleaners.basic_cleaners(string)[source]#

Basic pipeline that lowercases and collapses whitespace without transliteration.

egrecho.utils.text.cleaners.english_cleaners(string)[source]#

Pipeline for English text, including number and abbreviation expansion.

egrecho.utils.text.cleaners.chinese_mandarin_cleaners(string)[source]#

Basic pipeline for chinese

Return type:

str

egrecho.utils.text.cleaners.multilingual_cleaners(string)[source]#

Pipeline for multilingual text

egrecho.utils.text.cleaners.transliteration_cleaners(string)[source]#

Pipeline for non-English text that transliterates to ASCII.

egrecho.utils.text.cleaners.phoneme_cleaners(string)[source]#

Pipeline for phonemes mode, including number and abbreviation expansion.

egrecho.utils.text.cleaners.replace_symbols(string, lang='en')[source]#

Replace symbols based on the lenguage tag.

Parameters:
  • string -- Input text.

  • lang -- Lenguage identifier. ex: “en”, “fr”, “pt”, “ca”.

Returns:

The modified string .. rubric:: Example

input args:

string: “si l’avi cau, diguem-ho” lang: “ca”

Output:

string: “si lavi cau, diguemho”

egrecho.utils.text.number_cn#

egrecho.utils.text.number_cn.replace_numbers_to_characters_in_text(text)[source]#

Replace all arabic numbers in a text by their equivalent in chinese characters (simplified)

Parameters:

text (str) -- input text to transform

Returns:

output text

Return type:

str