opennmt.bin.ark_to_records module

ARK data file to TFRecords converter.

The scripts takes the ARK data file and optionally the indexed target text to write aligned source and target data.


Consumes the next vector.

Parameters:ark_file – The ARK data file.
Returns:The next vector as a 2D Numpy array.

Consumes the next text line from text_file.

opennmt.bin.ark_to_records.write_text(text, writer)[source]

Serializes a line of text.

opennmt.bin.ark_to_records.ark_to_records_aligned(ark_filename, text_filename, out_prefix)[source]

Converts ARK and text datasets to aligned TFRecords and text datasets.

opennmt.bin.ark_to_records.ark_to_records(ark_filename, out_prefix)[source]

Converts ARK dataset to TFRecords.