# opennmt.decoders.decoder module¶

Base class and functions for dynamic decoders.

opennmt.decoders.decoder.logits_to_cum_log_probs(logits, sequence_length)[source]

Returns the cumulated log probabilities of sequences.

Parameters: logits – The sequence of logits of shape $$[B, T, ...]$$. sequence_length – The length of each sequence of shape $$[B]$$. The cumulated log probability of each sequence.
opennmt.decoders.decoder.get_embedding_fn(embedding)[source]

Returns the embedding function.

Parameters: embedding – The embedding tensor or a callable that takes word ids. A callable that takes word ids.
opennmt.decoders.decoder.build_output_layer(num_units, vocab_size, dtype=None)[source]

Builds the output projection layer.

Parameters: num_units – The layer input depth. vocab_size – The layer output depth. dtype – The layer dtype. A tf.layers.Dense instance. ValueError – if vocab_size is None.
opennmt.decoders.decoder.get_sampling_probability(global_step, read_probability=None, schedule_type=None, k=None)[source]

Returns the sampling probability as described in https://arxiv.org/abs/1506.03099.

Parameters: global_step – The training step. read_probability – The probability to read from the inputs. schedule_type – The type of schedule. k – The convergence constant. The probability to sample from the output ids as a 0D tf.Tensor or None if scheduled sampling is not configured. ValueError – if schedule_type is set but not k or if schedule_type is linear but an initial read_probability is not set. TypeError – if schedule_type is invalid.
class opennmt.decoders.decoder.Decoder[source]

Bases: object

Base class for decoders.

decode(inputs, sequence_length, vocab_size=None, initial_state=None, sampling_probability=None, embedding=None, output_layer=None, mode='train', memory=None, memory_sequence_length=None, return_alignment_history=False)[source]

Decodes a full input sequence.

Usually used for training and evaluation where target sequences are known.

Parameters: inputs – The input to decode of shape $$[B, T, ...]$$. sequence_length – The length of each input with shape $$[B]$$. vocab_size – The output vocabulary size. Must be set if output_layer is not set. initial_state – The initial state as a (possibly nested tuple of…) tensors. sampling_probability – The probability of sampling categorically from the output ids instead of reading directly from the inputs. embedding – The embedding tensor or a callable that takes word ids. Must be set when sampling_probability is set. output_layer – Optional layer to apply to the output prior sampling. Must be set if vocab_size is not set. mode – A tf.estimator.ModeKeys mode. memory – (optional) Memory values to query. memory_sequence_length – (optional) Memory values length. return_alignment_history – If True, also returns the alignment history from the attention layer (None will be returned if unsupported by the decoder). A tuple (outputs, state, sequence_length) or (outputs, state, sequence_length, alignment_history) if return_alignment_history is True.
support_alignment_history

Returns True if this decoder can return the attention as alignment history.

support_multi_source

Returns True if this decoder supports multiple source context.

dynamic_decode(embedding, start_tokens, end_token, vocab_size=None, initial_state=None, output_layer=None, maximum_iterations=250, minimum_length=0, mode='infer', memory=None, memory_sequence_length=None, dtype=None, return_alignment_history=False, sample_from=None, sample_temperature=None)[source]

Decodes dynamically from start_tokens with greedy search.

Usually used for inference.

Parameters: embedding – The embedding tensor or a callable that takes word ids. start_tokens – The start token ids with shape $$[B]$$. end_token – The end token id. vocab_size – The output vocabulary size. Must be set if output_layer is not set. initial_state – The initial state as a (possibly nested tuple of…) tensors. output_layer – Optional layer to apply to the output prior sampling. Must be set if vocab_size is not set. maximum_iterations – The maximum number of decoding iterations. minimum_length – The minimum length of decoded sequences (end_token excluded). mode – A tf.estimator.ModeKeys mode. memory – (optional) Memory values to query. memory_sequence_length – (optional) Memory values length. dtype – The data type. Required if memory is None. return_alignment_history – If True, also returns the alignment history from the attention layer (None will be returned if unsupported by the decoder). sample_from – Sample predictions from the sample_from most likely tokens. If 0, sample from the full output distribution. sample_temperature – Value dividing logits. In random sampling, a high value generates more random samples. A tuple (predicted_ids, state, sequence_length, log_probs) or (predicted_ids, state, sequence_length, log_probs, alignment_history) if return_alignment_history is True.

Decodes dynamically from start_tokens with beam search.

Usually used for inference.

Parameters: embedding – The embedding tensor or a callable that takes word ids. start_tokens – The start token ids with shape $$[B]$$. end_token – The end token id. vocab_size – The output vocabulary size. Must be set if output_layer is not set. initial_state – The initial state as a (possibly nested tuple of…) tensors. output_layer – Optional layer to apply to the output prior sampling. Must be set if vocab_size is not set. beam_width – The width of the beam. length_penalty – The length penalty weight during beam search. maximum_iterations – The maximum number of decoding iterations. minimum_length – The minimum length of decoded sequences (end_token excluded). mode – A tf.estimator.ModeKeys mode. memory – (optional) Memory values to query. memory_sequence_length – (optional) Memory values length. dtype – The data type. Required if memory is None. return_alignment_history – If True, also returns the alignment history from the attention layer (None will be returned if unsupported by the decoder). sample_from – Sample predictions from the sample_from most likely tokens. If 0, sample from the full output distribution. sample_temperature – Value dividing logits. In random sampling, a high value generates more random samples. coverage_penalty – The coverage penalty weight during beam search. A tuple (predicted_ids, state, sequence_length, log_probs) or (predicted_ids, state, sequence_length, log_probs, alignment_history) if return_alignment_history is True.
decode_from_inputs(inputs, sequence_length, initial_state=None, mode='train', memory=None, memory_sequence_length=None)[source]

Decodes from full inputs.

Parameters: inputs – The input to decode of shape $$[B, T, ...]$$. sequence_length – The length of each input with shape $$[B]$$. initial_state – The initial state as a (possibly nested tuple of…) tensors. mode – A tf.estimator.ModeKeys mode. memory – (optional) Memory values to query. memory_sequence_length – (optional) Memory values length. A tuple (outputs, state) or (outputs, state, attention) if self.support_alignment_history.
step_fn(mode, batch_size, initial_state=None, memory=None, memory_sequence_length=None, dtype=tf.float32)[source]

Callable to run decoding steps.

Parameters: mode – A tf.estimator.ModeKeys mode. batch_size – The batch size. initial_state – The initial state to start from as a (possibly nested tuple of…) tensors. memory – (optional) Memory values to query. memory_sequence_length – (optional) Memory values length. dtype – The data type. A callable with the signature (step, inputs, state, mode) -> (outputs, state) or (outputs, state, attention) if self.support_alignment_history.
class opennmt.decoders.decoder.DecoderV2(num_sources=1, **kwargs)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Base class for decoders.

Note

TensorFlow 2.0 version.

__init__(num_sources=1, **kwargs)[source]

Initializes the decoder parameters.

Parameters: num_sources – The number of source contexts expected by this decoder. **kwargs – Additional layer arguments. ValueError – if the number of source contexts num_sources is not supported by this decoder.
minimum_sources

The minimum number of source contexts supported by this decoder.

maximum_sources

The maximum number of source contexts supported by this decoder.

initialize(vocab_size=None, output_layer=None)[source]

Initializes the decoder configuration.

Parameters: vocab_size – The target vocabulary size. output_layer – The output layer to use. ValueError – if both vocab_size and output_layer are not set.
get_initial_state(initial_state=None, batch_size=None, dtype=tf.float32)[source]

Returns the initial decoder state.

Parameters: initial_state – An initial state to start from, e.g. the last encoder state. batch_size – The batch size to use. dtype – The dtype of the state. A nested structure of tensors representing the decoder state. RuntimeError – if the decoder was not initialized. ValueError – if one of batch_size or dtype is not set but initial_state is not passed.
call(inputs, length_or_step=None, state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs the decoder layer on either a complete sequence (e.g. for training or scoring), or a single timestep (e.g. for iterative decoding).

Parameters: inputs – The inputs to decode, can be a 3D (training) or 2D (iterative decoding) tensor. length_or_step – For 3D inputs, the length of each sequence. For 2D inputs, the current decoding timestep. state – The decoder state. memory – Memory values to query. memory_sequence_length – Memory values length. training – Run in training mode. A tuple with the logits, the decoder state, and an attention vector. RuntimeError – if the decoder was not initialized. ValueError – if the inputs rank is different than 2 or 3. ValueError – if length_or_step is invalid. ValueError – if the number of source contexts (memory) does not match the number defined at the decoder initialization.
forward(inputs, sequence_length=None, initial_state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs the decoder on full sequences.

Parameters: inputs – The 3D decoder input. sequence_length – The length of each input sequence. initial_state – The initial decoder state. memory – Memory values to query. memory_sequence_length – Memory values length. training – Run in training mode. A tuple with the decoder outputs, the decoder state, and the attention vector.
step(inputs, timestep, state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs one decoding step.

Parameters: inputs – The 2D decoder input. timestep – The current decoding step. state – The decoder state. memory – Memory values to query. memory_sequence_length – Memory values length. training – Run in training mode. A tuple with the decoder outputs, the decoder state, and the attention vector.