lightning¶
Lightning utilities.
- cneuromax.fitting.deeplearning.utils.lightning.instantiate_trainer(trainer_partial, logger_partial, device, output_dir, save_every_n_train_steps)[source]¶
Instantiates
trainer_partial.- Parameters:
trainer_partial (
partial[Trainer])logger_partial (
partial[WandbLogger])output_dir (
str) – Seeoutput_dir.save_every_n_train_steps (
int|None) – Seesave_every_n_train_steps.
- Return type:
- Returns:
A
lightning.pytorch.Trainerinstance.
- cneuromax.fitting.deeplearning.utils.lightning.set_batch_size_and_num_workers(trainer, datamodule, litmodule, device, output_dir)[source]¶
Sets attribute values for a
BaseDataModule.See
find_good_per_device_batch_size()andfind_good_per_device_num_workers()for more details on how these variables’ values are determined.- Parameters:
output_dir (
str) – Seeoutput_dir.
- Return type:
- cneuromax.fitting.deeplearning.utils.lightning.find_good_per_device_batch_size(litmodule, datamodule, device, device_ids, output_dir)[source]¶
Probes a
per_device_batch_sizevalue.This functionality makes the following, not always correct, but generally reasonable assumptions:
As long as the
total_batch_size / dataset_sizeratio remains
small (e.g.
< 0.01so as to benefit from the stochasticity of gradient updates), running the same number of gradient updates with a larger batch size will yield better training performance than running the same number of gradient updates with a smaller batch size.Loading data from disk to RAM is a larger bottleneck than loading
data from RAM to GPU VRAM.
If you are training on multiple GPUs, each GPU has roughly the
same amount of VRAM.
- Parameters:
device_ids (
list[int]) – Seelightning.pytorch.Trainer.device_ids.output_dir (
str) – Seeoutput_dir.
- Return type:
- Returns:
A roughly optimal
per_device_batch_sizevalue.
- cneuromax.fitting.deeplearning.utils.lightning.find_good_per_device_num_workers(datamodule, per_device_batch_size, max_num_data_passes=100)[source]¶
Probes a
per_device_num_workersvalue.Iterates through a range of
num_workersvalues and measures the time it takes to iterate through a fixed number of data passes; returning the value that yields the shortest time.- Parameters:
per_device_batch_size (
int) – The return value offind_good_per_device_batch_size().max_num_data_passes (
int, default:100) – Maximum number of data passes to iterate through.
- Return type:
- Returns:
A roughly optimal
per_device_num_workersvalue.
- class cneuromax.fitting.deeplearning.utils.lightning.InitOptimParamsCheckpointConnector(trainer)[source]¶
Bases:
_CheckpointConnectorTweaked Lightning ckpt connector.
Allows to make use of the instantiated optimizers’ hyper-parameters rather than the checkpointed hyper-parameters. For use when resuming training with different optimizer hyper-parameters (e.g. with a PBT Hydra Sweeper).