Saved data format
Several kinds of data are saved as one uses the Framework: simulation outputs, simulation settings, treatment results, etc.
Current format
Simulation output
All the outputs of the simulation are saved in a single output.h5 hdf5 file using h5py. Each step N of the simulation corresponds to a group step_N, and each field ux, uy, ... is saved as one dataset with the same name in the group.
The following data is saved in each step (group step_N):
As attributes:
t: the simulation’s timeN_points: the base size of the gridk_min: the grid’s minimum wavevectork0: if True, the grid supports the k=0k=0 mode, false otherwisettrack_X: elapsed time for key X
As datasets
The fields on the grid, each named after their field name.
Simulation settings
The settings of the simulation are saved in a settings.json JSON file using orjson. Note that this library doesn’t allow np.nan, np.inf-like objects. We do not put this inside the output.h5 file for two reasons. First, the data is too unstructured (nested dictionaries of arbitrary types). Second, we want the settings to be human-readable.
The following data is saved in the settings:
init_t: the time at which to start the simulationend_simulation={"t", "elapsed_time", "step", "ode_step"}t: time at which to end the simulationelapsed_time: real-life time elapsed at which to end the simulationstep: saved step N at which to end the simulationode_step: number of ode steps at which to end the simulation
N_steps: number of saved sters since the startN_ode_steps: number of performed ode steps since the startl_params, D, fields_name, simu_params: same as set in Solver()solver_params: same as set in Solver.solve
Simulation source
The source used to run the simulation is saved as source.py. The goal is to remember a long time after we have run the simulation what was the exact code used (forcing, initial conditions, …). For the same human-readability reason as above, we don’t put it in output.h5.
Treatment computations
The computed quantities from the treatment of the simulation are saved in drawables.npy. The goal is to be able to reuse them without recomputing if we want to plot the same data a different way. As they also contain nested structures of arbitrary length, it is unpractical to save them in output.h5.
Treatment outputs
Most treatment functions save an image of their plot.
Backups
Both output.h5 and settings.json are frequently written to. As a result, there is a significant risk of data corruption if the process is killed while writing. To avoid this risk, a backup of each file output_bk.h5 and settings_bk.json is saved. Once the main file has been written to and closed, we update the backup. When reading either file, if we fail to read the main one, we fall back on the backup.
Downside: Now that we save all the outputs as a big file, the backup file takes a significant space, effectively doubling the disk space of the simulations. The solution is to prune the backup files once the simulation is finished. A python tool to automatically prune a directory recursively is planned in issue #31.
Old format
Gitlab commit: 9de8071cd20936f8a2c0839f91fa63272eff66aa
Warning
This save format is still supported in 2.x, but may be discontinued in a future version
Each time step is saved in a separate .npz file. Each file contains a dict with keys {fields, t, N_points, k_min, elapsed_time, k0, ...} (same meaning as above).
Downside: This creates a lot of small files, which is a pain when moving, deleting, listing etc.
Older format
Gitlab commit: 86a41d9d99e26b2c3bebcce7ff15c2ccb0f521c5
Warning
This save format is still supported in 2.x, but may be discontinued in a future version
Each time step is saved in a separate .npz file. Each file contains a dict with keys {arr_0, arr_1, ...} which are then remapped onto {fields, t, N_points, k_min, elapsed_time, k0, ...} (same meaning as above).
Downside: The load/save order of the arrays is important. We can’t add arbitrary data to the save.