opennmt.estimator module

Functions for Estimator API integration.

opennmt.estimator.make_serving_input_fn(model, metadata=None)[source]

Returns the serving input function.

Parameters:
  • model – An initialized opennmt.models.model.Model instance.
  • metadata – Optional data configuration (to be removed). Some inputters currently require to peek into some data files to infer input sizes.
Returns:

A callable that returns a tf.estimator.export.ServingInputReceiver.

opennmt.estimator.make_input_fn(model, mode, batch_size, features_file, labels_file=None, batch_type='examples', batch_multiplier=1, bucket_width=None, maximum_features_length=None, maximum_labels_length=None, shuffle_buffer_size=None, single_pass=False, num_shards=1, shard_index=0, num_threads=None, prefetch_buffer_size=None, return_dataset=True)[source]

Creates the input function.

Parameters:
  • model – An initialized opennmt.models.model.Model instance.
  • mode – A tf.estimator.ModeKeys mode.
  • batch_size – The batch size to use.
  • features_file – The file containing input features.
  • labels_file – The file containing output labels.
  • batch_type – The training batching stragety to use: can be “examples” or “tokens”.
  • batch_multiplier – The batch size multiplier to prepare splitting accross replicated graph parts.
  • bucket_width – The width of the length buckets to select batch candidates from. None to not constrain batch formation.
  • maximum_features_length – The maximum length or list of maximum lengths of the features sequence(s). None to not constrain the length.
  • maximum_labels_length – The maximum length of the labels sequence. None to not constrain the length.
  • shuffle_buffer_size – The number of elements from which to sample.
  • single_pass – If True, makes a single pass over the training data.
  • num_shards – The number of data shards (usually the number of workers in a distributed setting).
  • shard_index – The shard index this input pipeline should read from.
  • num_threads – The number of elements processed in parallel.
  • prefetch_buffer_size – The number of batches to prefetch asynchronously. If None, use an automatically tuned value on TensorFlow 1.8+ and 1 on older versions.
  • return_dataset – Make the input function return a tf.data.Dataset directly or the next element.
Returns:

The input function.

See also

tf.estimator.Estimator.

opennmt.estimator.make_model_fn(model, eval_prediction_hooks_fn=None, num_devices=1, devices=None, hvd=None)[source]

Creates the model function.

Parameters:
  • model – An initialized opennmt.models.model.Model instance.
  • eval_prediction_hooks_fn – A callable that takes the model predictions during evaluation and return an iterable of evaluation hooks (e.g. for saving predictions on disk, running external evaluators, etc.).
  • num_devices – The number of devices used for training.
  • devices – The list of devices used for training, if known.
  • hvd – Optional Horovod object.

See also

tf.estimator.Estimator ‘s model_fn argument for more details about arguments and the returned value.