opennmt.tokenizers.opennmt_tokenizer module

Define the OpenNMT tokenizer.

opennmt.tokenizers.opennmt_tokenizer.create_tokenizer(config)[source]

Creates a new OpenNMT tokenizer.

Parameters:config – A dictionary of tokenization options.
Returns:A pyonmttok.Tokenizer.
class opennmt.tokenizers.opennmt_tokenizer.OpenNMTTokenizer(*arg, **kwargs)[source]

Bases: opennmt.tokenizers.tokenizer.Tokenizer

Uses the OpenNMT tokenizer.

initialize(metadata, asset_dir=None, asset_prefix='')[source]

Initializes the tokenizer (e.g. load BPE models).

Parameters:
  • metadata – A dictionary containing additional metadata set by the user.
  • asset_dir – The directory where assets can be written. If None, no assets are returned.
  • asset_prefix – The prefix to attach to assets filename.
Returns:

A dictionary containing additional assets used by the tokenizer.

export_assets(asset_dir, asset_prefix='')[source]

Exports assets for this tokenizer.

Parameters:
  • asset_dir – The directory where assets can be written.
  • asset_prefix – The prefix to attach to assets filename.
Returns:

A dictionary containing additional assets used by the tokenizer.