lightning¶

Lightning utilities.

cneuromax.fitting.deeplearning.utils.lightning.instantiate_trainer(trainer_partial, logger_partial, device, output_dir, save_every_n_train_steps)[source]¶

Instantiates trainer_partial.

Parameters:

trainer_partial (partial[Trainer])
logger_partial (partial[WandbLogger])
device (str) – See device.
output_dir (str) – See output_dir.
save_every_n_train_steps (int | None) – See save_every_n_train_steps.

Return type:

Trainer

Returns:

A lightning.pytorch.Trainer instance.

cneuromax.fitting.deeplearning.utils.lightning.set_batch_size_and_num_workers(trainer, datamodule, litmodule, device, output_dir)[source]¶

Sets attribute values for a BaseDataModule.

See find_good_per_device_batch_size() and find_good_per_device_num_workers() for more details on how these variables’ values are determined.

Parameters:

trainer (Trainer)
datamodule (BaseDataModule)
litmodule (BaseLitModule)
device (str) – See device.
output_dir (str) – See output_dir.

Return type:

None

cneuromax.fitting.deeplearning.utils.lightning.find_good_per_device_batch_size(litmodule, datamodule, device, device_ids, output_dir)[source]¶

Probes a per_device_batch_size value.

This functionality makes the following, not always correct, but generally reasonable assumptions:

As long as the total_batch_size / dataset_size ratio remains

small (e.g. < 0.01 so as to benefit from the stochasticity of gradient updates), running the same number of gradient updates with a larger batch size will yield better training performance than running the same number of gradient updates with a smaller batch size.

Loading data from disk to RAM is a larger bottleneck than loading

data from RAM to GPU VRAM.

If you are training on multiple GPUs, each GPU has roughly the

same amount of VRAM.

Parameters:

litmodule (BaseLitModule)
datamodule (BaseDataModule)
device (str) – See device.
device_ids (list[int]) – See lightning.pytorch.Trainer.device_ids.
output_dir (str) – See output_dir.

Return type:

int

Returns:

A roughly optimal per_device_batch_size value.

cneuromax.fitting.deeplearning.utils.lightning.find_good_per_device_num_workers(datamodule, per_device_batch_size, max_num_data_passes=100)[source]¶

Probes a per_device_num_workers value.

Iterates through a range of num_workers values and measures the time it takes to iterate through a fixed number of data passes; returning the value that yields the shortest time.

Parameters:

datamodule (BaseDataModule)
per_device_batch_size (int) – The return value of find_good_per_device_batch_size().
max_num_data_passes (int, default: 100) – Maximum number of data passes to iterate through.

Return type:

int

Returns:

A roughly optimal per_device_num_workers value.

class cneuromax.fitting.deeplearning.utils.lightning.InitOptimParamsCheckpointConnector(trainer)[source]¶

Bases: _CheckpointConnector

Tweaked Lightning ckpt connector.

Allows to make use of the instantiated optimizers’ hyper-parameters rather than the checkpointed hyper-parameters. For use when resuming training with different optimizer hyper-parameters (e.g. with a PBT Hydra Sweeper).

restore_optimizers()[source]¶

Tweaked method to preserve newly instantiated parameters.

Return type:: None