egrecho.core.loads#

class egrecho.core.loads.HLoads[source]#

Bases: ObjectDict

Structs load parameters as:

model:
    _cls_: egrecho.models.ecapa.model.EcapaModel
    # override_init_model_cfg
    config: {}
    # other kwargs placeholder
    ...
feature_extractor:
    _cls_: egrecho.data.features.feature_extractor_audio.KaldiFeatureExtractor
    # kwargs passing to _cls_.fetch_from
    ...
classmethod from_config(config=None, **kwargs)[source]#

Creates hloads from config.

Input config can be an instance of dict|str|Path|HLoads|MutableMapping, the **kwargs will be merged recursely into config.

Normalize dict -> Merge -> (maybe) Omegaconf resolve -> Instantiate -> Output

Parameters:
  • config (Optional[Union[HLoads, MutableMapping, str, Path, Namespace, None]]) -- The configuration.

  • **kwargs -- Override kwargs

Returns:

The new hloads instance.

Return type:

HLoads

to_cfg_file(path, file_type=None, **kwargs)[source]#

Saves hloads.

Parameters:

path (Union[Path, str]) -- path to config file.

classmethod from_cfg_file(path, file_type=None, **kwargs)[source]#

Get hloads from file.

Parameters:

path (Union[Path, str]) -- path to config file.

Return type:

HLoads

Returns:

class egrecho.core.loads.HResults[source]#

Bases: ObjectDict

Structs loaded result of SaveLoadHelper.fetch_from.

class egrecho.core.loads.SaveLoadHelper[source]#

Bases: object

Save/load model in a directory, overwrite this for any special manners.

Example:

from egrecho.core.loads import SaveLoadHelper
from egrecho.models.ecapa.model import EcapaModel
from egrecho.data.features.feature_extractor_audio import KaldiFeatureExtractor
sl_helper = SaveLoadHelper()
extractor = KaldiFeatureExtractor()
model = EcapaModel()
dirpath = 'testdir/ecapa'
sl_helper.save_to(dirpath,model_or_state=model,components=extractor)
$ tree testdir/ecapa
testdir/ecapa/
β”œβ”€β”€ config
β”‚   β”œβ”€β”€ feature_config.yaml
β”‚   β”œβ”€β”€ model_config.yaml
β”‚   └── types.yaml
└── model_weight.ckpt
hresults = sl_helper.fetch_from(dirpath)
assert isinstance(hresults.model,EcapaModel)
assert isinstance(hresults.feature_extractor, KaldiFeatureExtractor)
# hloads control random init
hloads = {'model': {'init_weight': 'random'}}
hresults = sl_helper.fetch_from(dirpath, hloads=hloads)
# kwargs overrides to pretrained again
hresults = sl_helper.fetch_from(dirpath, hloads=hloads, model={'init_weight': 'pretrained'})

# now remove types.yaml
# rm -f testdir/ecapa/config/types.yaml
hresults = sl_helper.fetch_from(dirpath, single_key='model')
# raise ConfigurationException: Failed request model type
# Let's complete the model type
model_cls = 'egrecho.models.ecapa.model.EcapaModel'
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
# Type is ok
model_cls = EcapaModel
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
# classname string is ok as EcapaModel is already imported
model_cls = 'EcapaModel'
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
model_cls = 'Valle'
# Error as 'Valle' is not registed.
hresults = sl_helper.fetch_from(
    dirpath,
    single_key="model",
    kwargs_recurse_override=False,
    model={"_cls_": model_cls, "init_weight": "random", "config": None},
)  # only load model without weight and eliminate the influences of Ecapa model directory
from egrecho.models.valle.model import Valle
# Try again.
hresults = sl_helper.fetch_from(
    dirpath,
    single_key="model",
    kwargs_recurse_override=False,
    model={"_cls_": model_cls, "init_weight": "random", "config": None},
)
assert isinstance(hresults.model, Valle)
save_to(savedir, model_or_state=None, components=None, **kwargs)[source]#

Save model after pretraining.

Exports a pretrained model with its subcompnents (configs, tokenizer, etc …) outdir like:

./savedir
β”œβ”€β”€ model_weight.ckpt
└── ./config
    β”œβ”€β”€ model_config.yaml
    β”œβ”€β”€ feature_config.yaml
    └── types.yaml
Parameters:
  • savedir -- local directory.

  • model_or_state (Union[TopVirtualModel, Dict[str, Any], None]) -- TopVirtualModel object or model state dict to be saved.

  • components (Optional[Iterable[Any]]) -- obj of tokenizer, feature extractor etc..

Example:

from egrecho.core.loads import SaveLoadHelper
from egrecho.models.ecapa.model import EcapaModel
from egrecho.data.features.feature_extractor_audio import KaldiFeatureExtractor
sl_helper = SaveLoadHelper()
extractor = KaldiFeatureExtractor()
model = EcapaModel()
dirpath = 'testdir/ecapa'
sl_helper.save_to(dirpath,model_or_state=model,components=extractor)
$ tree testdir/ecapa
testdir/ecapa/
β”œβ”€β”€ config
β”‚   β”œβ”€β”€ feature_config.yaml
β”‚   β”œβ”€β”€ model_config.yaml
β”‚   └── types.yaml
└── model_weight.ckpt
hresults = sl_helper.fetch_from(dirpath)
assert isinstance(hresults.model,EcapaModel)
assert isinstance(hresults.feature_extractor, KaldiFeatureExtractor)
# hloads control random init
hloads = {'model': {'init_weight': 'random'}}
hresults = sl_helper.fetch_from(dirpath, hloads=hloads)
# kwargs overrides to pretrained again
hresults = sl_helper.fetch_from(dirpath, hloads=hloads, model={'init_weight': 'pretrained'})

# now remove types.yaml
# rm -f testdir/ecapa/config/types.yaml
hresults = sl_helper.fetch_from(dirpath, single_key='model')
# raise ConfigurationException: Failed request model type
# Let's complete the model type
model_cls = 'egrecho.models.ecapa.model.EcapaModel'
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
# Type is ok
model_cls = EcapaModel
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
# classname string is ok as EcapaModel is already imported
model_cls = 'EcapaModel'
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
model_cls = 'Valle'
# Error as 'Valle' is not registed.
hresults = sl_helper.fetch_from(
    dirpath,
    single_key="model",
    kwargs_recurse_override=False,
    model={"_cls_": model_cls, "init_weight": "random", "config": None},
)  # only load model without weight and eliminate the influences of Ecapa model directory
from egrecho.models.valle.model import Valle
# Try again.
hresults = sl_helper.fetch_from(
    dirpath,
    single_key="model",
    kwargs_recurse_override=False,
    model={"_cls_": model_cls, "init_weight": "random", "config": None},
)
assert isinstance(hresults.model, Valle)
fetch_from(srcdir, hloads=None, base_model_cls=None, skip_keys=None, single_key=None, return_hloads=False, kwargs_recurse_override=True, **kwargs)[source]#

Load module class in Hloads. Return HResults dict contains (MODEL, FEATURE_EXTRACTOR, …), MODEL could be:

  • A instance of model.

  • None when skip_keys apply on model.

Note

Workflow is defined as a sequence of the following operations:

  1. Reslove hloads.

    User could use config file or passing kwargs to control behaviour.

  2. Load available class types via load_types_dict().

    Note that passing classname as type is available if that class is imported in current namespace and is a subclass of some base module, support (TopVirtualModel, BaseFeaature, BaseTokenizer) now. E.g., instead of passing a full class path: 'egrecho.models.ecapa.model.EcapaModel', user can first import that class in python module act as a register manner, then the class name 'EcapaModel' is available. This mechianism could simplify parameter control.

  3. Instantiate classes instantiate_classes() according to typtes dict.

    Specially, the model is loaded lazily, i.e., a tuple of (MODEL_CLS, INIT_MODEL_CFG, LEFT_MODEL_CFG) resloved by instantiate_model(lazy_model=True).

  4. Instantiate model _instantiate_model().

    User might overwrite this method in subclasses.

  5. Load model weight.

Parameters:
  • srcdir (Union[str, Path]) --

    Model directory like:

    ./srcdir
    β”œβ”€β”€ model_weight.ckpt
    └── ./config
        β”œβ”€β”€ model_config.yaml
        β”œβ”€β”€ feature_config.yaml
        └── types.yaml
    

  • hloads (Union[str, Path, Dict[str, Any], None]) --

    Path|str|Dict, optional Hparam dict/file with hierarchical structure as in this example:

    model:
        _cls_: egrecho.models.ecapa.model.EcapaModel
        # override_init_model_cfg
        config: {}
        # other kwargs placeholder
        ...
    feature_extractor:
        _cls_: egrecho.data.features.feature_extractor_audio.KaldiFeatureExtractor
        # kwargs passing to _cls_.fetch_from
        ...
    

    You most likely won’t need this since default behaviours well. However, this arguments give a chance to complete/override kwargs.

  • base_model_cls (Union[str, Type, None]) -- Base model class

  • single_key (Optional[str]) -- Load specify key.

  • skip_keys (Union[str, Literal['model', 'others', 'null'], Set[str], None]) -- Skip keys, e.g., skip model. Invalid when single_key=True.

  • kwargs_recurse_override (bool) -- Whether kwargs recursely overrides hloads.

  • kwargs (Dict[str,Any]) --

    Overrides hloads.

    Hint

    Example of model-related params.

    self.fetch_from(..., model=dict(init_weight='last.ckpt', strict=False)
    
    • init_weight: Init weight from (β€˜pretrained’ or β€˜random’), or string ckpt name (model_weight.ckpt) or full path to ckpt /path/to/model_weight.ckpt. Default: 'pretrained'

    • map_location: MAP_LOCATION_TYPE as in torch.load(). Defaults to β€˜cpu’. If you preferring to load a checkpoint saved a GPU model to GPU, set it to None (not move to another GPU) or set a specified device.

    • strict : bool, optional, Whether to strictly enforce that the keys in checkpoint match the keys returned by this module’s state dict. Defaults to True.

Return type:

HResults

Returns:

A HResults dict.

Example:

from egrecho.core.loads import SaveLoadHelper
from egrecho.models.ecapa.model import EcapaModel
from egrecho.data.features.feature_extractor_audio import KaldiFeatureExtractor
sl_helper = SaveLoadHelper()
extractor = KaldiFeatureExtractor()
model = EcapaModel()
dirpath = 'testdir/ecapa'
sl_helper.save_to(dirpath,model_or_state=model,components=extractor)
$ tree testdir/ecapa
testdir/ecapa/
β”œβ”€β”€ config
β”‚   β”œβ”€β”€ feature_config.yaml
β”‚   β”œβ”€β”€ model_config.yaml
β”‚   └── types.yaml
└── model_weight.ckpt
hresults = sl_helper.fetch_from(dirpath)
assert isinstance(hresults.model,EcapaModel)
assert isinstance(hresults.feature_extractor, KaldiFeatureExtractor)
# hloads control random init
hloads = {'model': {'init_weight': 'random'}}
hresults = sl_helper.fetch_from(dirpath, hloads=hloads)
# kwargs overrides to pretrained again
hresults = sl_helper.fetch_from(dirpath, hloads=hloads, model={'init_weight': 'pretrained'})

# now remove types.yaml
# rm -f testdir/ecapa/config/types.yaml
hresults = sl_helper.fetch_from(dirpath, single_key='model')
# raise ConfigurationException: Failed request model type
# Let's complete the model type
model_cls = 'egrecho.models.ecapa.model.EcapaModel'
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
# Type is ok
model_cls = EcapaModel
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
# classname string is ok as EcapaModel is already imported
model_cls = 'EcapaModel'
hresults = sl_helper.fetch_from(dirpath, single_key='model', model={'_cls_': model_cls})
assert isinstance(hresults.model,EcapaModel)
model_cls = 'Valle'
# Error as 'Valle' is not registed.
hresults = sl_helper.fetch_from(
    dirpath,
    single_key="model",
    kwargs_recurse_override=False,
    model={"_cls_": model_cls, "init_weight": "random", "config": None},
)  # only load model without weight and eliminate the influences of Ecapa model directory
from egrecho.models.valle.model import Valle
# Try again.
hresults = sl_helper.fetch_from(
    dirpath,
    single_key="model",
    kwargs_recurse_override=False,
    model={"_cls_": model_cls, "init_weight": "random", "config": None},
)
assert isinstance(hresults.model, Valle)
load_model_with_components(cfg_dir, hloads=None, base_model_cls=None, skip_keys=None, single_key=None, lazy_model=False)[source]#

Load module class in Hloads. Return tuple contains (MODEL, COMPONETS, HLOADS), where TYPES_DICT indicates what classes will be used to instance an object. Model could be:

  • A tuple of (MODEL_INSTANCE, LEFT_MODEL_CFG).

  • A tuple of (MODEL_CLS, INIT_MODEL_CFG, LEFT_MODEL_CFG) resloved as lazy model in _instantiate_model().

  • None when skip_keys apply on model.

Parameters:
  • cfg_dir (str) -- Directory contains cfg files.

  • hloads (Optional[HLoads]) --

    Path|str|Dict, optional Hparam dict/file with hierarchical structure as in this example:

    model:
        _cls_: egrecho.models.ecapa.model.EcapaModel
        # replace default model_config.yaml
        config_fname: some_config.yaml
        # override_init_model_cfg
        config: {}
        # other kwargs placeholder
        ...
    feature_extractor:
        _cls_: egrecho.data.features.feature_extractor_audio.KaldiFeatureExtractor
        # kwargs passing to _cls_.fetch_from
        ...
    

    You most likely won’t need this since default behaviours well. However, this arguments give a chance to complete/override kwargs.

  • base_model_cls (Union[str, Type, None]) -- Base model class

  • single_key (Optional[str]) -- Load specify key.

  • skip_keys (Union[str, Literal['model', 'others', 'null'], Set[str], None]) -- Skip keys, e.g., skip model. Invalid when single_key=True.

  • lazy_model (bool) -- If False, instantiate model else just left mode cls with its init cfg. Default: False

Return type:

Tuple[Union[Tuple[Module, Dict[str, Any]], Tuple[Type, Dict[str, Any], Dict[str, Any]], None], Dict[str, Any], HLoads]

Returns:

A tuple contains (MODEL, COMPONETS, HLOADS).

modify_state_dict(state_dict, model_cfg)[source]#

Allows to modify the state dict before loading parameters into a model. :type state_dict: :param state_dict: The state dict restored from the checkpoint. :type model_cfg: :param model_cfg: A model level dict object.

Returns:

A potentially modified state dict.

load_instance_with_state_dict(instance, state_dict, strict)[source]#

Utility method that loads a model instance with the (potentially modified) state dict.

Parameters:
  • instance -- ModelPT subclass instance.

  • state_dict -- The state dict (which may have been modified)

  • strict -- Bool, whether to perform strict checks when loading the state dict.

egrecho.core.loads.save_ckpt_conf_dir(ckptdir, model_conf=None, extractor=None, model_type=None, feature_extractor_type=None, **kwargs)[source]#

Makes it convenient to load from pretrained, save extractor, model_type, etc.. to a dir.

Construct a dir like:

./ckptdir
└── ./config
    β”œβ”€β”€ model_config.yaml
    β”œβ”€β”€ feature_config.yaml
    └── types.yaml
Parameters:
  • ckptdir (str) -- the parent of savedir, it will create a config subdir as a placeholder of files.

  • model_conf (Optional[Dict[str, Any]]) -- a dict of model config.

  • extractor (Union[Dict[str, Any], BaseFeature, None]) -- extractor can be either a dict or a instance of BaseFeature.

  • model_type (Union[str, Type, None]) -- model class type or class import path.

  • feature_extractor_type (Union[str, Type, None]) -- feature extractor class type or class import path.

class egrecho.core.loads.ResolveModelResult(checkpoint=None, model_type=None, feature_config=None)[source]#

Bases: object

Resolved opts.

Parameters:
  • checkpoint (str) -- ckpt weight path

  • model_type (str) -- model type string.

  • feature_config (Optional[Dict[str, Any]]) -- loaded dict of feature extractor config.

egrecho.core.loads.resolve_pretrained_model(checkpoint='last.ckpt', dirpath=None, best_k_mode='min', version='version', extractor_fname='feature_config.yaml', **resolve_ckpt_kwargs)[source]#

Resolves checkpoint, model_type, feats_config.

Checkpoint resolving see resolve_ckpt() for details. Auto resolve local dir like:

./dirpath/version_1
        └── checkpoints
            β”œβ”€β”€ best_k_models.yaml
            β”œβ”€β”€ last.ckpt
            β”œβ”€β”€ abc.ckpt
            └── ./config
                β”œβ”€β”€ model_config.yaml
                β”œβ”€β”€ feature_config.yaml
                └── types.yaml
Parameters:
  • checkpoint (str, optional) -- The file name of checkpoint to resolve, local file needs a suffix like ".ckpt" / ".pt", While checkpoint="best" is a preseved key means it will find best_k_fname which is a file contains Dict[BEST_K_MODEL_PATH, BEST_K_SCORE], and sort by its score to match a best ckpt. Defaults to β€œlast.ckpt”.

  • dirpath (Path or str, optional) -- The root path. Defaults to None, which means the current directory.

  • version (str, optional) -- The versioned subdir name. Conmmonly subdir is named as β€œversion_0/version_1”, if you specify the version name with a version num, it will search that version dir, otherwise choose the max number of version (above β€œversion_1”). Defaults to β€œversion”.

  • best_k_mode (Literal["max", "min"], optional) -- The mode for selecting the best_k checkpoint. Defaults to β€œmin”.

  • extractor_fname (str) -- feature extractor file name, defaults to "feature_config.yaml", search in config/ subdir.

  • resolve_ckpt_kwargs (dict) -- additional kwargs to resolve_ckpt().

Return type:

ResolveModelResult

egrecho.core.loads.load_module_class(module_path, base_module_type=None)[source]#

Given a import path which contains class and returns the class type.

If import path is full format, it should be dot import format and the last part is the class name.

If only provide model class name (without dot β€œ.”), it will resolve the subclasses of base_module_type which have been registered via imports in python file and match the model name in the last part. if one name matches more than one model class, it’will failed and you need provide the full path to elimiate ambiguity.

Parameters:
  • module_path (str) -- The import path containing the module class. For the case only provide class name, that class should be registered by import in your python.

  • base_module_type (Type, optional) -- The base class type to check against.

Returns:

The class type loaded from the module path.

Return type:

Type

egrecho.core.loads.load_model_type(module_path, base_module_type=<class 'egrecho.core.module.TopVirtualModel'>)[source]#

Given a import path which contains class and returns the class type.

If import path is full format, it should be dot import format and the last part is the class name.

If only provide model class name (without dot β€œ.”), it will resolve the subclasses of base_module_type which have been registered via imports in python file and match the model name in the last part. if one name matches more than one model class, it’will failed and you need provide the full path to elimiate ambiguity.

Parameters:
  • module_path (str) -- The import path containing the module class. For the case only provide class name, that class should be registered by import in your python.

  • base_module_type (Type, optional) -- The base class type to check against.

Returns:

The class type loaded from the module path.

Return type:

Type

egrecho.core.loads.load_extractor_type(module_path, base_module_type=<class 'egrecho.core.feature_extractor.BaseFeature'>)[source]#

Given a import path which contains class and returns the class type.

If import path is full format, it should be dot import format and the last part is the class name.

If only provide model class name (without dot β€œ.”), it will resolve the subclasses of base_module_type which have been registered via imports in python file and match the model name in the last part. if one name matches more than one model class, it’will failed and you need provide the full path to elimiate ambiguity.

Parameters:
  • module_path (str) -- The import path containing the module class. For the case only provide class name, that class should be registered by import in your python.

  • base_module_type (Type, optional) -- The base class type to check against.

Returns:

The class type loaded from the module path.

Return type:

Type

egrecho.core.loads.load_extend_default_type(module_path, default_type=None)[source]#

Allows simple class name when default_type is provided.

  • If import path is dot β€œcalender.Calender” format, it should be full import format and the last part is the class name. Note that default_type is ignored in this case.

  • If only provide model class name (without dot β€œ.”), must provide default_type as base_module_type, then it will resolve the subclasses of default_type which have been registered via imports in python file and match the model name in the last part. If one name matches more than one model class, it’will failed and you need provide the full path to elimiate ambiguity.

Parameters:
  • module_path (str) -- The import path containing the module class. For the case only provide class name, that class should be a subclass of default_type registered by import in your python.

  • default_type (Type, optional) -- The default class type.

Returns:

The class type loaded from the module path.

Return type:

Type

class egrecho.core.loads.SpecialSkipType(value)[source]#

Bases: StrEnum

Special skip mode when loading module.

  • model: exclude model.

  • others: only model.