opennmt.bin.ark_to_records module

ARK data file to TFRecords converter.

The scripts takes the ARK data file and optionally the indexed target text to write aligned source and target data.

opennmt.bin.ark_to_records.consume_next_vector(ark_file)[source]

Consumes the next vector.

Parameters:ark_file – The ARK data file.
Returns:The next vector as a 2D Numpy array.
opennmt.bin.ark_to_records.consume_next_text(text_file)[source]

Consumes the next text line from text_file.

opennmt.bin.ark_to_records.write_text(text, writer)[source]

Serializes a line of text.

opennmt.bin.ark_to_records.ark_to_records_aligned(ark_filename, text_filename, out_prefix)[source]

Converts ARK and text datasets to aligned TFRecords and text datasets.

opennmt.bin.ark_to_records.ark_to_records(ark_filename, out_prefix)[source]

Converts ARK dataset to TFRecords.

opennmt.bin.ark_to_records.main()[source]