Translation

Translations

class onmt.translate.Traduction(src, src_raw, pred_sents, attn, pred_scores, tgt_sent, gold_score, word_aligns)[source]

Bases: object

Container for a translated sentence.

Variables
  • src (LongTensor) – Source word IDs.

  • src_raw (List[str]) – Raw source words.

  • pred_sents (List[List[str]]) – Words from the n-best translations.

  • pred_scores (List[List[float]]) – Log-probs of n-best translations.

  • attns (List[FloatTensor]) – Attention distribution for each translation.

  • gold_sent (List[str]) – Words from gold translation.

  • gold_score (List[float]) – Log-prob of gold translation.

  • word_aligns (List[FloatTensor]) – Words Alignment distribution for each translation.

log(sent_number)[source]

Log translation.

Translator Class

class onmt.translate.Translator(model, fields, src_reader, tgt_reader, gpu=-1, n_best=1, min_length=0, max_length=100, ratio=0.0, beam_size=30, random_sampling_topk=1, random_sampling_temp=1, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, phrase_table='', data_type='text', verbose=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_align=False, report_score=True, logger=None, seed=-1)[source]

Bases: object

Translate a batch of sentences with a saved model.

Parameters
  • model (onmt.modules.NMTModel) – NMT model to use for translation

  • fields (dict[str, torchtext.data.Field]) – A dict mapping each side to its list of name-Field pairs.

  • src_reader (onmt.inputters.DataReaderBase) – Source reader.

  • tgt_reader (onmt.inputters.TextDataReader) – Target reader.

  • gpu (int) – GPU device. Set to negative for no GPU.

  • n_best (int) – How many beams to wait for.

  • min_length (int) – See onmt.translate.decode_strategy.DecodeStrategy.

  • max_length (int) – See onmt.translate.decode_strategy.DecodeStrategy.

  • beam_size (int) – Number of beams.

  • random_sampling_topk (int) – See onmt.translate.greedy_search.GreedySearch.

  • random_sampling_temp (int) – See onmt.translate.greedy_search.GreedySearch.

  • stepwise_penalty (bool) – Whether coverage penalty is applied every step or not.

  • dump_beam (bool) – Debugging option.

  • block_ngram_repeat (int) – See onmt.translate.decode_strategy.DecodeStrategy.

  • ignore_when_blocking (set or frozenset) – See onmt.translate.decode_strategy.DecodeStrategy.

  • replace_unk (bool) – Replace unknown token.

  • data_type (str) – Source data type.

  • verbose (bool) – Print/log every translation.

  • report_time (bool) – Print/log total time/frequency.

  • copy_attn (bool) – Use copy attention.

  • global_scorer (onmt.translate.GNMTGlobalScorer) – Translation scoring/reranking object.

  • out_file (TextIO or codecs.StreamReaderWriter) – Output file.

  • report_score (bool) – Whether to report scores

  • logger (logging.Logger or NoneType) – Logger.

classmethod from_opt(model, fields, opt, model_opt, global_scorer=None, out_file=None, report_align=False, report_score=True, logger=None)[source]

Alternate constructor.

Parameters
  • model (onmt.modules.NMTModel) – See __init__().

  • fields (dict[str, torchtext.data.Field]) – See __init__().

  • opt (argparse.Namespace) – Command line options

  • model_opt (argparse.Namespace) – Command line options saved with the model checkpoint.

  • global_scorer (onmt.translate.GNMTGlobalScorer) – See __init__()..

  • out_file (TextIO or codecs.StreamReaderWriter) – See __init__().

  • report_align (bool) – See __init__().

  • report_score (bool) – See __init__().

  • logger (logging.Logger or NoneType) – See __init__().

translate(src, tgt=None, src_dir=None, batch_size=None, batch_type='sents', attn_debug=False, align_debug=False, phrase_table='')[source]

Translate content of src and get gold scores from tgt.

Parameters
  • src – See self.src_reader.read().

  • tgt – See self.tgt_reader.read().

  • src_dir – See self.src_reader.read() (only relevant for certain types of data).

  • batch_size (int) – size of examples per mini-batch

  • attn_debug (bool) – enables the attention logging

  • align_debug (bool) – enables the word alignment logging

Returns

(list, list)

  • all_scores is a list of batch_size lists of n_best scores

  • all_predictions is a list of batch_size lists

    of n_best predictions

translate_batch(batch, src_vocabs, attn_debug)[source]

Translate a batch of sentences.

class onmt.translate.TranslationBuilder(data, fields, n_best=1, replace_unk=False, has_tgt=False, phrase_table='')[source]

Bases: object

Build a word-based translation from the batch output of translator and the underlying dictionaries.

Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]

Parameters
  • data (onmt.inputters.Dataset) – Data.

  • fields (List[Tuple[str, torchtext.data.Field]]) – data fields

  • n_best (int) – number of translations produced

  • replace_unk (bool) – replace unknown words using attention

  • has_tgt (bool) – will the batch have gold targets

Decoding Strategies

class onmt.translate.DecodeStrategy(pad, bos, eos, batch_size, parallel_paths, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length)[source]

Bases: object

Base class for generation strategies.

Parameters
  • pad (int) – Magic integer in output vocab.

  • bos (int) – Magic integer in output vocab.

  • eos (int) – Magic integer in output vocab.

  • batch_size (int) – Current batch size.

  • parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated parallel_paths times in relevant state tensors.

  • min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.

  • max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).

  • block_ngram_repeat (int) – Block beams where block_ngram_repeat-grams repeat.

  • exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.

  • return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.

Variables
  • pad (int) – See above.

  • bos (int) – See above.

  • eos (int) – See above.

  • predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences.

  • scores (list[list[FloatTensor]]) – For each batch, holds a list of scores.

  • attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape (step, inp_seq_len) where inp_seq_len is the length of the sample (not the max length of all inp seqs).

  • alive_seq (LongTensor) – Shape (B x parallel_paths, step). This sequence grows in the step axis on each call to advance().

  • is_finished (ByteTensor or NoneType) – Shape (B, parallel_paths). Initialized to None.

  • alive_attn (FloatTensor or NoneType) – If tensor, shape is (step, B x parallel_paths, inp_seq_len), where inp_seq_len is the (max) length of the input sequence.

  • min_length (int) – See above.

  • max_length (int) – See above.

  • block_ngram_repeat (int) – See above.

  • exclusion_tokens (set[int]) – See above.

  • return_attention (bool) – See above.

  • done (bool) – See above.

advance(log_probs, attn)[source]

DecodeStrategy subclasses should override advance().

Advance is used to update self.alive_seq, self.is_finished, and, when appropriate, self.alive_attn.

initialize(memory_bank, src_lengths, src_map=None, device=None)[source]

DecodeStrategy subclasses should override initialize().

initialize should be called before all actions. used to prepare necessary ingredients for decode.

update_finished()[source]

DecodeStrategy subclasses should override update_finished().

update_finished is used to update self.predictions, self.scores, and other “output” attributes.

class onmt.translate.BeamSearch(beam_size, batch_size, pad, bos, eos, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio)[source]

Bases: onmt.translate.decode_strategy.DecodeStrategy

Generation beam search.

Note that the attributes list is not exhaustive. Rather, it highlights tensors to document their shape. (Since the state variables’ “batch” size decreases as beams finish, we denote this axis with a B rather than batch_size).

Parameters
  • beam_size (int) – Number of beams to use (see base parallel_paths).

  • batch_size (int) – See base.

  • pad (int) – See base.

  • bos (int) – See base.

  • eos (int) – See base.

  • n_best (int) – Don’t stop until at least this many beams have reached EOS.

  • global_scorer (onmt.translate.GNMTGlobalScorer) – Scorer instance.

  • min_length (int) – See base.

  • max_length (int) – See base.

  • return_attention (bool) – See base.

  • block_ngram_repeat (int) – See base.

  • exclusion_tokens (set[int]) – See base.

Variables
  • top_beam_finished (ByteTensor) – Shape (B,).

  • _batch_offset (LongTensor) – Shape (B,).

  • _beam_offset (LongTensor) – Shape (batch_size x beam_size,).

  • alive_seq (LongTensor) – See base.

  • topk_log_probs (FloatTensor) – Shape (B x beam_size,). These are the scores used for the topk operation.

  • memory_lengths (LongTensor) – Lengths of encodings. Used for masking attentions.

  • select_indices (LongTensor or NoneType) – Shape (B x beam_size,). This is just a flat view of the _batch_index.

  • topk_scores (FloatTensor) – Shape (B, beam_size). These are the scores a sequence will receive if it finishes.

  • topk_ids (LongTensor) – Shape (B, beam_size). These are the word indices of the topk predictions.

  • _batch_index (LongTensor) – Shape (B, beam_size).

  • _prev_penalty (FloatTensor or NoneType) – Shape (B, beam_size). Initialized to None.

  • _coverage (FloatTensor or NoneType) – Shape (1, B x beam_size, inp_seq_len).

  • hypotheses (list[list[Tuple[Tensor]]]) – Contains a tuple of score (float), sequence (long), and attention (float or None).

advance(log_probs, attn)[source]

DecodeStrategy subclasses should override advance().

Advance is used to update self.alive_seq, self.is_finished, and, when appropriate, self.alive_attn.

initialize(memory_bank, src_lengths, src_map=None, device=None)[source]

Initialize for decoding. Repeat src objects beam_size times.

update_finished()[source]

DecodeStrategy subclasses should override update_finished().

update_finished is used to update self.predictions, self.scores, and other “output” attributes.

onmt.translate.greedy_search.sample_with_temperature(logits, sampling_temp, keep_topk)[source]

Select next tokens randomly from the top k possible next tokens.

Samples from a categorical distribution over the keep_topk words using the category probabilities logits / sampling_temp.

Parameters
  • logits (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)

  • sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.

  • keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.

Returns

  • topk_ids: Shaped (batch_size, 1). These are the sampled word indices in the output vocab.

  • topk_scores: Shaped (batch_size, 1). These are essentially (logits / sampling_temp)[topk_ids].

Return type

(LongTensor, FloatTensor)

class onmt.translate.GreedySearch(pad, bos, eos, batch_size, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk)[source]

Bases: onmt.translate.decode_strategy.DecodeStrategy

Select next tokens randomly from the top k possible next tokens.

The scores attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event that max_length is reached)

Parameters
  • pad (int) – See base.

  • bos (int) – See base.

  • eos (int) – See base.

  • batch_size (int) – See base.

  • min_length (int) – See base.

  • max_length (int) – See base.

  • block_ngram_repeat (int) – See base.

  • exclusion_tokens (set[int]) – See base.

  • return_attention (bool) – See base.

  • max_length – See base.

  • sampling_temp (float) – See sample_with_temperature().

  • keep_topk (int) – See sample_with_temperature().

advance(log_probs, attn)[source]

Select next tokens randomly from the top k possible next tokens.

Parameters
  • log_probs (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)

  • attn (FloatTensor) – Shaped (1, B, inp_seq_len).

initialize(memory_bank, src_lengths, src_map=None, device=None)[source]

Initialize for decoding.

update_finished()[source]

Finalize scores and predictions.

Scoring

class onmt.translate.penalties.PenaltyBuilder(cov_pen, length_pen)[source]

Bases: object

Returns the Length and Coverage Penalty function for Beam Search.

Parameters
  • length_pen (str) – option name of length pen

  • cov_pen (str) – option name of cov pen

Variables
  • has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.

  • has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.

  • coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.

  • length_penalty (callable[[int, float], float]) – Calculates the length penalty.

coverage_none(cov, beta=0.0)[source]

Returns zero as penalty

coverage_summary(cov, beta=0.0)[source]

Our summary penalty.

coverage_wu(cov, beta=0.0)[source]

GNMT coverage re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16]. cov is expected to be sized (*, seq_len), where * is probably batch_size x beam_size but could be several dimensions like (batch_size, beam_size). If cov is attention, then the seq_len axis probably sums to (almost) 1.

length_average(cur_len, alpha=0.0)[source]

Returns the current sequence length.

length_none(cur_len, alpha=0.0)[source]

Returns unmodified scores.

length_wu(cur_len, alpha=0.0)[source]

GNMT length re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16].

class onmt.translate.GNMTGlobalScorer(alpha, beta, length_penalty, coverage_penalty)[source]

Bases: object

NMT re-ranking.

Parameters
  • alpha (float) – Length parameter.

  • beta (float) – Coverage parameter.

  • length_penalty (str) – Length penalty strategy.

  • coverage_penalty (str) – Coverage penalty strategy.

Variables