opennmt.decoders.decoder module

Base class and functions for dynamic decoders.

opennmt.decoders.decoder.logits_to_cum_log_probs(logits, sequence_length)[source]

Returns the cumulated log probabilities of sequences.

Parameters:
  • logits – The sequence of logits of shape \([B, T, ...]\).
  • sequence_length – The length of each sequence of shape \([B]\).
Returns:

The cumulated log probability of each sequence.

opennmt.decoders.decoder.get_embedding_fn(embedding)[source]

Returns the embedding function.

Parameters:embedding – The embedding tensor or a callable that takes word ids.
Returns:A callable that takes word ids.
opennmt.decoders.decoder.build_output_layer(num_units, vocab_size, dtype=None)[source]

Builds the output projection layer.

Parameters:
  • num_units – The layer input depth.
  • vocab_size – The layer output depth.
  • dtype – The layer dtype.
Returns:

A tf.layers.Dense instance.

Raises:

ValueError – if vocab_size is None.

opennmt.decoders.decoder.get_sampling_probability(global_step, read_probability=None, schedule_type=None, k=None)[source]

Returns the sampling probability as described in https://arxiv.org/abs/1506.03099.

Parameters:
  • global_step – The training step.
  • read_probability – The probability to read from the inputs.
  • schedule_type – The type of schedule.
  • k – The convergence constant.
Returns:

The probability to sample from the output ids as a 0D tf.Tensor or None if scheduled sampling is not configured.

Raises:
  • ValueError – if schedule_type is set but not k or if schedule_type is linear but an initial read_probability is not set.
  • TypeError – if schedule_type is invalid.
class opennmt.decoders.decoder.Decoder[source]

Bases: object

Base class for decoders.

decode(inputs, sequence_length, vocab_size=None, initial_state=None, sampling_probability=None, embedding=None, output_layer=None, mode='train', memory=None, memory_sequence_length=None, return_alignment_history=False)[source]

Decodes a full input sequence.

Usually used for training and evaluation where target sequences are known.

Parameters:
  • inputs – The input to decode of shape \([B, T, ...]\).
  • sequence_length – The length of each input with shape \([B]\).
  • vocab_size – The output vocabulary size. Must be set if output_layer is not set.
  • initial_state – The initial state as a (possibly nested tuple of…) tensors.
  • sampling_probability – The probability of sampling categorically from the output ids instead of reading directly from the inputs.
  • embedding – The embedding tensor or a callable that takes word ids. Must be set when sampling_probability is set.
  • output_layer – Optional layer to apply to the output prior sampling. Must be set if vocab_size is not set.
  • mode – A tf.estimator.ModeKeys mode.
  • memory – (optional) Memory values to query.
  • memory_sequence_length – (optional) Memory values length.
  • return_alignment_history – If True, also returns the alignment history from the attention layer (None will be returned if unsupported by the decoder).
Returns:

A tuple (outputs, state, sequence_length) or (outputs, state, sequence_length, alignment_history) if return_alignment_history is True.

support_alignment_history

Returns True if this decoder can return the attention as alignment history.

support_multi_source

Returns True if this decoder supports multiple source context.

dynamic_decode(embedding, start_tokens, end_token, vocab_size=None, initial_state=None, output_layer=None, maximum_iterations=250, minimum_length=0, mode='infer', memory=None, memory_sequence_length=None, dtype=None, return_alignment_history=False, sample_from=None, sample_temperature=None)[source]

Decodes dynamically from start_tokens with greedy search.

Usually used for inference.

Parameters:
  • embedding – The embedding tensor or a callable that takes word ids.
  • start_tokens – The start token ids with shape \([B]\).
  • end_token – The end token id.
  • vocab_size – The output vocabulary size. Must be set if output_layer is not set.
  • initial_state – The initial state as a (possibly nested tuple of…) tensors.
  • output_layer – Optional layer to apply to the output prior sampling. Must be set if vocab_size is not set.
  • maximum_iterations – The maximum number of decoding iterations.
  • minimum_length – The minimum length of decoded sequences (end_token excluded).
  • mode – A tf.estimator.ModeKeys mode.
  • memory – (optional) Memory values to query.
  • memory_sequence_length – (optional) Memory values length.
  • dtype – The data type. Required if memory is None.
  • return_alignment_history – If True, also returns the alignment history from the attention layer (None will be returned if unsupported by the decoder).
  • sample_from – Sample predictions from the sample_from most likely tokens. If 0, sample from the full output distribution.
  • sample_temperature – Value dividing logits. In random sampling, a high value generates more random samples.
Returns:

A tuple (predicted_ids, state, sequence_length, log_probs) or (predicted_ids, state, sequence_length, log_probs, alignment_history) if return_alignment_history is True.

Decodes dynamically from start_tokens with beam search.

Usually used for inference.

Parameters:
  • embedding – The embedding tensor or a callable that takes word ids.
  • start_tokens – The start token ids with shape \([B]\).
  • end_token – The end token id.
  • vocab_size – The output vocabulary size. Must be set if output_layer is not set.
  • initial_state – The initial state as a (possibly nested tuple of…) tensors.
  • output_layer – Optional layer to apply to the output prior sampling. Must be set if vocab_size is not set.
  • beam_width – The width of the beam.
  • length_penalty – The length penalty weight during beam search.
  • maximum_iterations – The maximum number of decoding iterations.
  • minimum_length – The minimum length of decoded sequences (end_token excluded).
  • mode – A tf.estimator.ModeKeys mode.
  • memory – (optional) Memory values to query.
  • memory_sequence_length – (optional) Memory values length.
  • dtype – The data type. Required if memory is None.
  • return_alignment_history – If True, also returns the alignment history from the attention layer (None will be returned if unsupported by the decoder).
  • sample_from – Sample predictions from the sample_from most likely tokens. If 0, sample from the full output distribution.
  • sample_temperature – Value dividing logits. In random sampling, a high value generates more random samples.
  • coverage_penalty – The coverage penalty weight during beam search.
Returns:

A tuple (predicted_ids, state, sequence_length, log_probs) or (predicted_ids, state, sequence_length, log_probs, alignment_history) if return_alignment_history is True.

decode_from_inputs(inputs, sequence_length, initial_state=None, mode='train', memory=None, memory_sequence_length=None)[source]

Decodes from full inputs.

Parameters:
  • inputs – The input to decode of shape \([B, T, ...]\).
  • sequence_length – The length of each input with shape \([B]\).
  • initial_state – The initial state as a (possibly nested tuple of…) tensors.
  • mode – A tf.estimator.ModeKeys mode.
  • memory – (optional) Memory values to query.
  • memory_sequence_length – (optional) Memory values length.
Returns:

A tuple (outputs, state) or (outputs, state, attention) if self.support_alignment_history.

step_fn(mode, batch_size, initial_state=None, memory=None, memory_sequence_length=None, dtype=tf.float32)[source]

Callable to run decoding steps.

Parameters:
  • mode – A tf.estimator.ModeKeys mode.
  • batch_size – The batch size.
  • initial_state – The initial state to start from as a (possibly nested tuple of…) tensors.
  • memory – (optional) Memory values to query.
  • memory_sequence_length – (optional) Memory values length.
  • dtype – The data type.
Returns:

A callable with the signature (step, inputs, state, mode) -> (outputs, state) or (outputs, state, attention) if self.support_alignment_history.

class opennmt.decoders.decoder.DecoderV2(num_sources=1, **kwargs)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Base class for decoders.

Note

TensorFlow 2.0 version.

__init__(num_sources=1, **kwargs)[source]

Initializes the decoder parameters.

Parameters:
  • num_sources – The number of source contexts expected by this decoder.
  • **kwargs – Additional layer arguments.
Raises:

ValueError – if the number of source contexts num_sources is not supported by this decoder.

minimum_sources

The minimum number of source contexts supported by this decoder.

maximum_sources

The maximum number of source contexts supported by this decoder.

initialize(vocab_size=None, output_layer=None)[source]

Initializes the decoder configuration.

Parameters:
  • vocab_size – The target vocabulary size.
  • output_layer – The output layer to use.
Raises:

ValueError – if both vocab_size and output_layer are not set.

get_initial_state(initial_state=None, batch_size=None, dtype=tf.float32)[source]

Returns the initial decoder state.

Parameters:
  • initial_state – An initial state to start from, e.g. the last encoder state.
  • batch_size – The batch size to use.
  • dtype – The dtype of the state.
Returns:

A nested structure of tensors representing the decoder state.

Raises:
  • RuntimeError – if the decoder was not initialized.
  • ValueError – if one of batch_size or dtype is not set but initial_state is not passed.
call(inputs, length_or_step=None, state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs the decoder layer on either a complete sequence (e.g. for training or scoring), or a single timestep (e.g. for iterative decoding).

Parameters:
  • inputs – The inputs to decode, can be a 3D (training) or 2D (iterative decoding) tensor.
  • length_or_step – For 3D inputs, the length of each sequence. For 2D inputs, the current decoding timestep.
  • state – The decoder state.
  • memory – Memory values to query.
  • memory_sequence_length – Memory values length.
  • training – Run in training mode.
Returns:

A tuple with the logits, the decoder state, and an attention vector.

Raises:
  • RuntimeError – if the decoder was not initialized.
  • ValueError – if the inputs rank is different than 2 or 3.
  • ValueError – if length_or_step is invalid.
  • ValueError – if the number of source contexts (memory) does not match the number defined at the decoder initialization.
forward(inputs, sequence_length=None, initial_state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs the decoder on full sequences.

Parameters:
  • inputs – The 3D decoder input.
  • sequence_length – The length of each input sequence.
  • initial_state – The initial decoder state.
  • memory – Memory values to query.
  • memory_sequence_length – Memory values length.
  • training – Run in training mode.
Returns:

A tuple with the decoder outputs, the decoder state, and the attention vector.

step(inputs, timestep, state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs one decoding step.

Parameters:
  • inputs – The 2D decoder input.
  • timestep – The current decoding step.
  • state – The decoder state.
  • memory – Memory values to query.
  • memory_sequence_length – Memory values length.
  • training – Run in training mode.
Returns:

A tuple with the decoder outputs, the decoder state, and the attention vector.