Saved data format

Several kinds of data are saved as one uses the Framework: simulation outputs, simulation settings, treatment results, etc.

Current format

Simulation output

All the outputs of the simulation are saved in a single output.h5 hdf5 file using h5py. Each step N of the simulation corresponds to a group step_N, and each field ux, uy, ... is saved as one dataset with the same name in the group.

The following data is saved in each step (group step_N):

As attributes:
- t: the simulation’s time
- N_points: the base size of the grid
- k_min: the grid’s minimum wavevector
- k0: if True, the grid supports the k=0k=0 mode, false otherwise
- ttrack_X: elapsed time for key X
As datasets
- The fields on the grid, each named after their field name.

Simulation settings

The settings of the simulation are saved in a settings.json JSON file using orjson. Note that this library doesn’t allow np.nan, np.inf-like objects. We do not put this inside the output.h5 file for two reasons. First, the data is too unstructured (nested dictionaries of arbitrary types). Second, we want the settings to be human-readable.

The following data is saved in the settings:

init_t: the time at which to start the simulation
end_simulation={"t", "elapsed_time", "step", "ode_step"}
- t: time at which to end the simulation
- elapsed_time: real-life time elapsed at which to end the simulation
- step: saved step N at which to end the simulation
- ode_step: number of ode steps at which to end the simulation
N_steps: number of saved sters since the start
N_ode_steps: number of performed ode steps since the start
l_params, D, fields_name, simu_params: same as set in Solver()
solver_params: same as set in Solver.solve

Simulation source

The source used to run the simulation is saved as source.py. The goal is to remember a long time after we have run the simulation what was the exact code used (forcing, initial conditions, …). For the same human-readability reason as above, we don’t put it in output.h5.

Treatment computations

The computed quantities from the treatment of the simulation are saved in drawables.npy. The goal is to be able to reuse them without recomputing if we want to plot the same data a different way. As they also contain nested structures of arbitrary length, it is unpractical to save them in output.h5.

Treatment outputs

Most treatment functions save an image of their plot.

Backups

Both output.h5 and settings.json are frequently written to. As a result, there is a significant risk of data corruption if the process is killed while writing. To avoid this risk, a backup of each file output_bk.h5 and settings_bk.json is saved. Once the main file has been written to and closed, we update the backup. When reading either file, if we fail to read the main one, we fall back on the backup.

Downside: Now that we save all the outputs as a big file, the backup file takes a significant space, effectively doubling the disk space of the simulations. The solution is to prune the backup files once the simulation is finished. A python tool to automatically prune a directory recursively is planned in issue #31.

Old format

Gitlab commit: 9de8071cd20936f8a2c0839f91fa63272eff66aa

Warning

This save format is still supported in 2.x, but may be discontinued in a future version

Each time step is saved in a separate .npz file. Each file contains a dict with keys {fields, t, N_points, k_min, elapsed_time, k0, ...} (same meaning as above).

Downside: This creates a lot of small files, which is a pain when moving, deleting, listing etc.

Older format

Gitlab commit: 86a41d9d99e26b2c3bebcce7ff15c2ccb0f521c5

Warning

This save format is still supported in 2.x, but may be discontinued in a future version

Each time step is saved in a separate .npz file. Each file contains a dict with keys {arr_0, arr_1, ...} which are then remapped onto {fields, t, N_points, k_min, elapsed_time, k0, ...} (same meaning as above).

Downside: The load/save order of the arrays is important. We can’t add arbitrary data to the save.