Translate

translate.py

usage: translate.py [-h] [-config CONFIG] [-save_config SAVE_CONFIG] --model
                    MODEL [MODEL ...] [--fp32] [--avg_raw_probs]
                    [--data_type DATA_TYPE] --src SRC [--src_dir SRC_DIR]
                    [--tgt TGT] [--shard_size SHARD_SIZE] [--output OUTPUT]
                    [--report_bleu] [--report_rouge] [--report_time]
                    [--dynamic_dict] [--share_vocab]
                    [--random_sampling_topk RANDOM_SAMPLING_TOPK]
                    [--random_sampling_temp RANDOM_SAMPLING_TEMP]
                    [--seed SEED] [--beam_size BEAM_SIZE]
                    [--min_length MIN_LENGTH] [--max_length MAX_LENGTH]
                    [--max_sent_length] [--stepwise_penalty]
                    [--length_penalty {none,wu,avg}]
                    [--coverage_penalty {none,wu,summary}] [--alpha ALPHA]
                    [--beta BETA] [--block_ngram_repeat BLOCK_NGRAM_REPEAT]
                    [--ignore_when_blocking IGNORE_WHEN_BLOCKING [IGNORE_WHEN_BLOCKING ...]]
                    [--replace_unk] [--verbose] [--log_file LOG_FILE]
                    [--log_file_level {WARNING,DEBUG,INFO,CRITICAL,NOTSET,ERROR,30,10,20,50,0,40}]
                    [--attn_debug] [--dump_beam DUMP_BEAM] [--n_best N_BEST]
                    [--batch_size BATCH_SIZE] [--gpu GPU]
                    [--sample_rate SAMPLE_RATE] [--window_size WINDOW_SIZE]
                    [--window_stride WINDOW_STRIDE] [--window WINDOW]
                    [--image_channel_size {3,1}]

Named Arguments

-config, --config
 config file path
-save_config, --save_config
 config file save path

Model

--model, -model
 

Path to model .pt file(s). Multiple models can be specified, for ensemble decoding.

Default: []

--fp32, -fp32

Force the model to be in FP32 because FP16 is very slow on GTX1080(ti).

Default: False

--avg_raw_probs, -avg_raw_probs
 

If this is set, during ensembling scores from different models will be combined by averaging their raw probabilities and then taking the log. Otherwise, the log probabilities will be averaged directly. Necessary for models whose output layers can assign zero probability.

Default: False

Data

--data_type, -data_type
 

Type of the source input. Options: [text|img].

Default: “text”

--src, -src Source sequence to decode (one line per sequence)
--src_dir, -src_dir
 

Source directory for image or audio files

Default: “”

--tgt, -tgt True target sequence (optional)
--shard_size, -shard_size
 

Divide src and tgt (if applicable) into smaller multiple src and tgt files, then build shards, each shard will have opt.shard_size samples except last shard. shard_size=0 means no segmentation shard_size>0 means segment dataset into multiple shards, each shard has shard_size samples

Default: 10000

--output, -output
 

Path to output the predictions (each line will be the decoded sequence

Default: “pred.txt”

--report_bleu, -report_bleu
 

Report bleu score after translation, call tools/multi-bleu.perl on command line

Default: False

--report_rouge, -report_rouge
 

Report rouge 1/2/3/L/SU4 score after translation call tools/test_rouge.py on command line

Default: False

--report_time, -report_time
 

Report some translation time metrics

Default: False

--dynamic_dict, -dynamic_dict
 

Create dynamic dictionaries

Default: False

--share_vocab, -share_vocab
 

Share source and target vocabulary

Default: False

Random Sampling

--random_sampling_topk, -random_sampling_topk
 

Set this to -1 to do random sampling from full distribution. Set this to value k>1 to do random sampling restricted to the k most likely next tokens. Set this to 1 to use argmax or for doing beam search.

Default: 1

--random_sampling_temp, -random_sampling_temp
 

If doing random sampling, divide the logits by this before computing softmax during decoding.

Default: 1.0

--seed, -seed

Random seed

Default: 829

Beam

--beam_size, -beam_size
 

Beam size

Default: 5

--min_length, -min_length
 

Minimum prediction length

Default: 0

--max_length, -max_length
 

Maximum prediction length.

Default: 100

--max_sent_length, -max_sent_length
 Deprecated, use -max_length instead
--stepwise_penalty, -stepwise_penalty
 

Apply penalty at every decoding step. Helpful for summary penalty.

Default: False

--length_penalty, -length_penalty
 

Possible choices: none, wu, avg

Length Penalty to use.

Default: “none”

--coverage_penalty, -coverage_penalty
 

Possible choices: none, wu, summary

Coverage Penalty to use.

Default: “none”

--alpha, -alpha
 

Google NMT length penalty parameter (higher = longer generation)

Default: 0.0

--beta, -beta

Coverage penalty parameter

Default: -0.0

--block_ngram_repeat, -block_ngram_repeat
 

Block repetition of ngrams during decoding.

Default: 0

--ignore_when_blocking, -ignore_when_blocking
 

Ignore these strings when blocking repeats. You want to block sentence delimiters.

Default: []

--replace_unk, -replace_unk
 

Replace the generated UNK tokens with the source token that had highest attention weight. If phrase_table is provided, it will lookup the identified source token and give the corresponding target token. If it is not provided(or the identified source token does not exist in the table) then it will copy the source token

Default: False

Logging

--verbose, -verbose
 

Print scores and predictions for each sentence

Default: False

--log_file, -log_file
 

Output logs to a file under this path.

Default: “”

--log_file_level, -log_file_level
 

Possible choices: WARNING, DEBUG, INFO, CRITICAL, NOTSET, ERROR, 30, 10, 20, 50, 0, 40

Default: “0”

--attn_debug, -attn_debug
 

Print best attn for each word

Default: False

--dump_beam, -dump_beam
 

File to dump beam information to.

Default: “”

--n_best, -n_best
 

If verbose is set, will output the n_best decoded sentences

Default: 1

Efficiency

--batch_size, -batch_size
 

Batch size

Default: 30

--gpu, -gpu

Device to run on

Default: -1

Speech

--sample_rate, -sample_rate
 

Sample rate.

Default: 16000

--window_size, -window_size
 

Window size for spectrogram in seconds

Default: 0.02

--window_stride, -window_stride
 

Window stride for spectrogram in seconds

Default: 0.01

--window, -window
 

Window type for spectrogram generation

Default: “hamming”

--image_channel_size, -image_channel_size
 

Possible choices: 3, 1

Using grayscale image can training model faster and smaller

Default: 3