opennmt.optimizers.mixed_precision_wrapper module

Wrapper that maintains and update a float32 copy of the weights.

class opennmt.optimizers.mixed_precision_wrapper.MixedPrecisionOptimizerWrapper(optimizer, loss_scale=None)[source]


compute_gradients(loss, var_list=None, gate_gradients=1, aggregation_method=None, colocate_gradients_with_ops=False, grad_loss=None)[source]

Compute gradients of loss for the variables in var_list.

This is the first part of minimize(). It returns a list of (gradient, variable) pairs where “gradient” is the gradient for “variable”. Note that “gradient” can be a Tensor, an IndexedSlices, or None if there is no gradient for the given variable.

  • loss – A Tensor containing the value to minimize or a callable taking no arguments which returns the value to minimize. When eager execution is enabled it must be a callable.
  • var_list – Optional list or tuple of tf.Variable to update to minimize loss. Defaults to the list of variables collected in the graph under the key GraphKeys.TRAINABLE_VARIABLES.
  • gate_gradients – How to gate the computation of gradients. Can be GATE_NONE, GATE_OP, or GATE_GRAPH.
  • aggregation_method – Specifies the method used to combine gradient terms. Valid values are defined in the class AggregationMethod.
  • colocate_gradients_with_ops – If True, try colocating gradients with the corresponding op.
  • grad_loss – Optional. A Tensor holding the gradient computed for loss.

A list of (gradient, variable) pairs. Variable is always present, but gradient can be None.

  • TypeError – If var_list contains anything else than Variable objects.
  • ValueError – If some arguments are invalid.
  • RuntimeError – If called with eager execution enabled and loss is not callable.

@compatibility(eager) When eager execution is enabled, gate_gradients, aggregation_method, and colocate_gradients_with_ops are ignored. @end_compatibility

apply_gradients(grads_and_vars, global_step=None, name=None)[source]

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

  • grads_and_vars – List of (gradient, variable) pairs as returned by compute_gradients().
  • global_step – Optional Variable to increment by one after the variables have been updated.
  • name – Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

  • TypeError – If grads_and_vars is malformed.
  • ValueError – If none of the variables have gradients.
  • RuntimeError – If you should use _distributed_apply() instead.
class opennmt.optimizers.mixed_precision_wrapper.AutomaticLossScaler(algorithm='Backoff', params=None)[source]

Bases: object

SUPPORTED_ALGOS = ['backoff', 'logmax']
update_op(has_nan, amax)[source]
static check_grads(grads_and_vars)[source]
class opennmt.optimizers.mixed_precision_wrapper.BackoffScaler(params)[source]

Bases: object

update_op(has_nan, amax)[source]
class opennmt.optimizers.mixed_precision_wrapper.LogMaxScaler(params)[source]

Bases: object

update_op(has_nan, amax)[source]

Returns the loss scale argument from user parameters.

Parameters:params – A dictionary containing the user parameters.
A value that can be passed to the

loss_scale constructor argument.