opennmt.utils.decoding module

Dynamic decoding utilities.

class opennmt.utils.decoding.Sampler[source]

Bases: object

Base class for samplers.

__call__(scores, num_samples=1)[source]

Samples predictions.

Parameters:
  • scores – The scores to sample from, a tensor of shape [batch_size, vocab_size].
  • num_samples – The number of samples per batch to produce.
Returns:

The sampled ids. sample_scores: The sampled scores.

Return type:

sample_ids

class opennmt.utils.decoding.RandomSampler(from_top_k=None, temperature=None)[source]

Bases: opennmt.utils.decoding.Sampler

Randomly samples from model outputs.

__init__(from_top_k=None, temperature=None)[source]

Initializes the random sampler.

Parameters:
  • from_top_k – Sample from the top K predictions instead of the full distribution.
  • temperature – Divide logits by this value. High temperatures generate more random samples.
class opennmt.utils.decoding.BestSampler[source]

Bases: opennmt.utils.decoding.Sampler

Sample the best predictions.

class opennmt.utils.decoding.DecodingStrategy[source]

Bases: object

Base class for decoding strategies.

num_hypotheses

The number of hypotheses returned by this strategy.

initialize(batch_size, start_ids, attention_size=None)[source]

Initializes the strategy.

Parameters:
  • batch_size – The batch size.
  • start_ids – The start decoding ids.
  • attention_size – If known, the size of the attention vectors (i.e. the maximum source length).
Returns:

The (possibly transformed) start decoding ids. finished: The tensor of finished flags. initial_log_probs: Initial log probabilities per batch. extra_vars: A sequence of additional tensors used during the decoding.

Return type:

start_ids

step(step, sampler, log_probs, cum_log_probs, finished, state, extra_vars, attention=None)[source]

Updates the strategy state.

Parameters:
  • step – The current decoding step.
  • sampler – The sampler that produces predictions.
  • log_probs – The model log probabilities.
  • cum_log_probs – The cumulated log probabilities per batch.
  • finished – The current finished flags.
  • state – The decoder state.
  • extra_vars – Additional tensors from this decoding strategy.
  • attention – The attention vector for the current step.
Returns:

The predicted word ids. cum_log_probs: The new cumulated log probabilities. finished: The updated finished flags. state: The update decoder state. extra_vars: Additional tensors from this decoding strategy.

Return type:

ids

finalize(outputs, end_id, extra_vars, attention=None)[source]

Finalize the predictions.

Parameters:
  • outputs – The array of sampled ids.
  • end_id – The end token id.
  • extra_vars – Additional tensors from this decoding strategy.
  • attention – The array of attention outputs.
Returns:

The final predictions as a tensor of shape [B, H, T]. final_attention: The final attention history of shape [B, H, T, S]. final_lengths: The final sequence lengths of shape [B, H].

Return type:

final_ids

class opennmt.utils.decoding.GreedySearch[source]

Bases: opennmt.utils.decoding.DecodingStrategy

A basic greedy search strategy.

initialize(batch_size, start_ids, attention_size=None)[source]

Initializes the strategy.

Parameters:
  • batch_size – The batch size.
  • start_ids – The start decoding ids.
  • attention_size – If known, the size of the attention vectors (i.e. the maximum source length).
Returns:

The (possibly transformed) start decoding ids. finished: The tensor of finished flags. initial_log_probs: Initial log probabilities per batch. extra_vars: A sequence of additional tensors used during the decoding.

Return type:

start_ids

step(step, sampler, log_probs, cum_log_probs, finished, state, extra_vars, attention=None)[source]

Updates the strategy state.

Parameters:
  • step – The current decoding step.
  • sampler – The sampler that produces predictions.
  • log_probs – The model log probabilities.
  • cum_log_probs – The cumulated log probabilities per batch.
  • finished – The current finished flags.
  • state – The decoder state.
  • extra_vars – Additional tensors from this decoding strategy.
  • attention – The attention vector for the current step.
Returns:

The predicted word ids. cum_log_probs: The new cumulated log probabilities. finished: The updated finished flags. state: The update decoder state. extra_vars: Additional tensors from this decoding strategy.

Return type:

ids

finalize(outputs, end_id, extra_vars, attention=None)[source]

Finalize the predictions.

Parameters:
  • outputs – The array of sampled ids.
  • end_id – The end token id.
  • extra_vars – Additional tensors from this decoding strategy.
  • attention – The array of attention outputs.
Returns:

The final predictions as a tensor of shape [B, H, T]. final_attention: The final attention history of shape [B, H, T, S]. final_lengths: The final sequence lengths of shape [B, H].

Return type:

final_ids

class opennmt.utils.decoding.BeamSearch(beam_size, length_penalty=0, coverage_penalty=0)[source]

Bases: opennmt.utils.decoding.DecodingStrategy

A beam search strategy.

num_hypotheses

The number of hypotheses returned by this strategy.

initialize(batch_size, start_ids, attention_size=None)[source]

Initializes the strategy.

Parameters:
  • batch_size – The batch size.
  • start_ids – The start decoding ids.
  • attention_size – If known, the size of the attention vectors (i.e. the maximum source length).
Returns:

The (possibly transformed) start decoding ids. finished: The tensor of finished flags. initial_log_probs: Initial log probabilities per batch. extra_vars: A sequence of additional tensors used during the decoding.

Return type:

start_ids

step(step, sampler, log_probs, cum_log_probs, finished, state, extra_vars, attention=None)[source]

Updates the strategy state.

Parameters:
  • step – The current decoding step.
  • sampler – The sampler that produces predictions.
  • log_probs – The model log probabilities.
  • cum_log_probs – The cumulated log probabilities per batch.
  • finished – The current finished flags.
  • state – The decoder state.
  • extra_vars – Additional tensors from this decoding strategy.
  • attention – The attention vector for the current step.
Returns:

The predicted word ids. cum_log_probs: The new cumulated log probabilities. finished: The updated finished flags. state: The update decoder state. extra_vars: Additional tensors from this decoding strategy.

Return type:

ids

finalize(outputs, end_id, extra_vars, attention=None)[source]

Finalize the predictions.

Parameters:
  • outputs – The array of sampled ids.
  • end_id – The end token id.
  • extra_vars – Additional tensors from this decoding strategy.
  • attention – The array of attention outputs.
Returns:

The final predictions as a tensor of shape [B, H, T]. final_attention: The final attention history of shape [B, H, T, S]. final_lengths: The final sequence lengths of shape [B, H].

Return type:

final_ids

opennmt.utils.decoding.dynamic_decode(symbols_to_logits_fn, start_ids, end_id=2, initial_state=None, decoding_strategy=None, sampler=None, maximum_iterations=None, minimum_iterations=0, attention_history=False, attention_size=None)[source]

Dynamic decoding.

Parameters:
  • symbols_to_logits_fn – A callable taking (symbols, step, state) and returning (logits, state, attention) (attention is optional).
  • start_ids – Initial input IDs of shape \([B]\).
  • end_id – ID of the end of sequence token.
  • initial_state – Initial decoder state.
  • decoding_strategy – A opennmt.utils.decoding.DecodingStrategy instance that define the decoding logic. Defaults to a greedy search.
  • sampler – A opennmt.utils.decoding.Sampler instance that samples predictions from the model output. Defaults to an argmax sampling.
  • maximum_iterations – The maximum number of iterations to decode for.
  • minimum_iterations – The minimum number of iterations to decode for.
  • attention_history – Gather attention history during the decoding.
  • attention_size – If known, the size of the attention vectors (i.e. the maximum source length).
Returns:

The predicted ids of shape \([B, H, T]\). lengths: The produced sequences length of shape \([B, H]\). log_probs: The cumulated log probabilities of shape \([B, H]\). attention_history: The attention history of shape \([B, H, T_t, T_s]\). state: The final decoding state.

Return type:

ids